diff --git a/PaddleCV/3d_vision/SMOKE/README.md b/PaddleCV/3d_vision/SMOKE/README.md new file mode 100755 index 0000000000000000000000000000000000000000..3bdf2c369d6e0282fb312fdb950016e79715e66d --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/README.md @@ -0,0 +1,149 @@ +# SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation + +## Requirements +All codes are tested under the following environment: +* CentOS 7.5 +* Python 3.7 +* PaddlePaddle 2.1.0 +* CUDA 10.2 + +## Preparation +1. PaddlePaddle installation +```bash + +conda create -n paddle_latest python=3.7 + +conda actviate paddle_latest + +pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + +pip install -r requirement.txt +``` + +2. Dataset preparations +Please first download the dataset and organize it as following structure: +``` +kitti +│──training +│ ├──calib +│ ├──label_2 +│ ├──image_2 +│ └──ImageSets +└──testing + ├──calib + ├──image_2 + └──ImageSets +``` +The make a soft link of kitti dataset and put it under `datasets/` folder. +```bash +mkdir datasets +ln -s path_to_kitti datasets/kitti +``` + +Note: If you want to use Waymo dataset for training, you should also organize it following the above structure. + + +3. Compile KITTI evaluation codes +```bash +cd tools/kitti_eval_offline +g++ -O3 -DNDEBUG -o evaluate_object_3d_offline evaluate_object_offline_40p.cpp +``` +Note: evaluate\_object\_3d\_40/11p.cpp stands for 40-point/11-point evaluation. +For further details please refer to [KITTI 3D Object Dataset](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) and [Disentangling Monocular 3D Object Detection](https://arxiv.org/abs/1905.12365). +For 11-point evaluation, simply change `evaluate_object_offline_40p.cpp` to `evaluate_object_offline_11p.cpp`. + + + +## Training + +Please download the [pre-trained weights](https://bj.bcebos.com/paddleseg/3d/smoke/dla34.pdparams). Put it into ```./pretrained```. + + +#### Single GPU +```bash +python train.py --config configs/train_val_kitti.yaml --log_iters 100 --save_interval 5000 --num_workers 2 +``` +#### Multi-GPUs + +Take two cards as an example. +```bash +export CUDA_VISIBLE_DEVICES="6, 7" && python -m paddle.distributed.launch train.py --config configs/train_val_kitti.yaml --log_iters 100 --save_interval 5000 --num_workers 2 +``` +#### VisualDL +Run the following command. If successful, view the training visualization via browser. +```bash +visualdl --logdir ./output +``` + +## Evaluation + +```bash +python val.py --config configs/train_val_kitti.yaml --model_path path-to-model/model.pdparams --num_workers 2 +``` + +The performance on KITTI 3D detection is as follows: + +| | Easy | Moderate | Hard | +|-------------|:-----:|:-----------:|:------:| +| Car | 6.51 | 4.98 | 4.63 | +| Pedestrian | 4.44 | 3.73 | 2.99 | +| Cyclist | 1.40 | 0.57 | 0.60 | + +The performance on WAYMO 3D detection is as follows: + +| | Easy | Moderate | Hard | +|-------------|:-----:|:-----------:|:------:| +| Car | 6.17 | 5.74 | 5.74 | +| Pedestrian | 0.35 | 0.34 | 0.34 | +| Cyclist | 0.54 | 0.53 | 0.53 | + +Download the well-trained models here, [smoke-release](https://bj.bcebos.com/paddleseg/3d/smoke/smoke-release.zip). + + +## Testing + +Please download and uncompress above model weights first. +```bash +python test.py --config configs/test_export.yaml --model_path path-to-model/model_waymo.pdparams --input_path examples/0615037.png --output_path paddle.png +``` + + + +## Model Deployment + +1. Convert to a static-graph model +```bash +export PYTHONPATH="$PWD" +``` +```bash +python deploy/export.py --config configs/test_export.yaml --model_path path-to-model/model_waymo.pdparams +``` + +Running the above command will generate three files in ```./depoly```, i.e. 1) inference.pdmodel, which maintains model graph/structure, 2) inference.pdiparams, which is well-trained parameters of the model, 3) inference.pdiparams.info, which includes extra meta info of the model. + +2. Visualize the model stucture. +```bash +visualdl --model deploy/inference.pdmodel +``` + +The above command could be a little bit slow. Instead, open the browser first via the following command, and then open the pdmodel locally. +```bash +visualdl +``` +Note: If you are using remote server, please specify the ```--host```, e.g. 10.9.189.6 +```bash +visualdl --model deploy/inference.pdmodel --host 10.9.189.6 +``` + +3. Python Inference on the converted model. + +Now you can run the inference anywhere without the repo. We provive an example for python inference. + +```bash +python deploy/infer.py --model_file deploy/inference.pdmodel --params_file deploy/inference.pdiparams --input_path examples/0615037.png --output_path paddle.png +``` + + +## Reference + +> Liu, Zechen, Zizhang Wu, and Roland Tóth. "Smoke: single-stage monocular 3d object detection via keypoint estimation." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 996-997. 2020. diff --git a/PaddleCV/3d_vision/SMOKE/configs/_base_/kitti.yaml b/PaddleCV/3d_vision/SMOKE/configs/_base_/kitti.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d07980a6f17898f4b0be6af142ceb985e5e193c0 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/configs/_base_/kitti.yaml @@ -0,0 +1,30 @@ +batch_size: 8 +iters: 70000 + +train_dataset: + type: KITTI + dataset_root: datasets/kitti/training + transforms: + - type: Normalize + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + flip_prob: 0.5 + aug_prob: 0.3 + mode: train + +val_dataset: + type: KITTI + dataset_root: datasets/kitti/training + transforms: + - type: Normalize + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + mode: val + +optimizer: + type: Adam + +lr_scheduler: + type: MultiStepDecay + milestones: [36000, 55000] + learning_rate: 1.25e-4 \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/configs/test_export.yaml b/PaddleCV/3d_vision/SMOKE/configs/test_export.yaml new file mode 100644 index 0000000000000000000000000000000000000000..82107b9a2987ad9e6238caced143c959a4fa2e16 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/configs/test_export.yaml @@ -0,0 +1,5 @@ +_base_: 'train_val_kitti.yaml' + +model: + post_process: + type: PostProcessorHm \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/configs/train_val_kitti.yaml b/PaddleCV/3d_vision/SMOKE/configs/train_val_kitti.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0d3be986248fe98952962bb4ec7125a38b0f3eb2 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/configs/train_val_kitti.yaml @@ -0,0 +1,33 @@ +_base_: '_base_/kitti.yaml' + +model: + type: SMOKE + #pretrained: null + backbone: + type: DLA34 + pretrained: "pretrained/dla34.pdparams" + head: + type: SMOKEPredictor + num_classes: 3 + reg_heads: 10 + reg_channels: [1, 2, 3, 2, 2] + num_chanels: 256 + norm_type: "gn" + in_channels: 64 + post_process: + type: PostProcessor + depth_ref: [28.01, 16.32] + dim_ref: [[3.88, 1.63, 1.53], [1.78, 1.70, 0.58], [0.88, 1.73, 0.67]] + reg_head: 10 + det_threshold: 0.25 + max_detection: 50 + pred_2d: True + +loss: + type: SMOKELossComputation + depth_ref: [28.01, 16.32] + dim_ref: [[3.88, 1.63, 1.53], [1.78, 1.70, 0.58], [0.88, 1.73, 0.67]] + reg_loss: "DisL1" + loss_weight: [1., 10.] + max_objs: 50 + diff --git a/PaddleCV/3d_vision/SMOKE/deploy/export.py b/PaddleCV/3d_vision/SMOKE/deploy/export.py new file mode 100644 index 0000000000000000000000000000000000000000..d03644369e69b3a4362ad929f7268752de1b9e6f --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/deploy/export.py @@ -0,0 +1,76 @@ + +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os + +import paddle + +from smoke.cvlibs import Config +from smoke.utils import load_pretrained_model + +def parse_args(): + parser = argparse.ArgumentParser(description='Model Export') + + # params of evaluate + parser.add_argument( + "--config", dest="cfg", help="The config file.", required=True, type=str) + parser.add_argument( + '--model_path', + dest='model_path', + help='The path of model for evaluation', + type=str, + required=True) + parser.add_argument( + '--output_dir', + dest='output_dir', + help='The directory saving inference params.', + type=str, + default="./deploy") + + + return parser.parse_args() + + +def main(args): + + cfg = Config(args.cfg) + + model = cfg.model + model.eval() + + load_pretrained_model(model, args.model_path) + + model = paddle.jit.to_static(model, + input_spec=[ + paddle.static.InputSpec( + shape=[1, 3, None, None], dtype="float32", + ), + [ + paddle.static.InputSpec( + shape=[1, 3, 3], dtype="float32" + ), + paddle.static.InputSpec( + shape=[1, 2], dtype="float32" + ) + ] + ] + ) + + paddle.jit.save(model, os.path.join(args.output_dir, "inference")) + +if __name__ == '__main__': + args = parse_args() + main(args) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/deploy/infer.py b/PaddleCV/3d_vision/SMOKE/deploy/infer.py new file mode 100644 index 0000000000000000000000000000000000000000..0319c39c28aa43e62513afaaf2e2df577a554d8f --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/deploy/infer.py @@ -0,0 +1,156 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import argparse + +import cv2 +import paddle +from paddle.inference import Config +from paddle.inference import create_predictor +from smoke.utils.vis_utils import encode_box3d, draw_box_3d + +def get_ratio(ori_img_size, output_size, down_ratio=(4, 4)): + return np.array([[down_ratio[1] * ori_img_size[1] / output_size[1], + down_ratio[0] * ori_img_size[0] / output_size[0]]], np.float32) + +def get_img(img_path): + img = cv2.imread(img_path) + ori_img_size = img.shape + img = cv2.resize(img, (960, 640)) + output_size = img.shape + img = img/255.0 + img = np.subtract(img, np.array([0.485, 0.456, 0.406])) + img = np.true_divide(img, np.array([0.229, 0.224, 0.225])) + img = np.array(img, np.float32) + img = img.transpose(2, 0, 1) + img = img[None,:,:,:] + + return img, ori_img_size, output_size + +def init_predictor(args): + if args.model_dir is not "": + config = Config(args.model_dir) + else: + config = Config(args.model_file, args.params_file) + + config.enable_memory_optim() + if args.use_gpu: + config.enable_use_gpu(1000, 0) + else: + # If not specific mkldnn, you can set the blas thread. + # The thread num should not be greater than the number of cores in the CPU. + config.set_cpu_math_library_num_threads(4) + config.enable_mkldnn() + + predictor = create_predictor(config) + return predictor + + +def run(predictor, img): + # copy img data to input tensor + input_names = predictor.get_input_names() + for i, name in enumerate(input_names): + input_tensor = predictor.get_input_handle(name) + input_tensor.reshape(img[i].shape) + input_tensor.copy_from_cpu(img[i].copy()) + + # do the inference + predictor.run() + + results = [] + # get out data from output tensor + output_names = predictor.get_output_names() + for i, name in enumerate(output_names): + output_tensor = predictor.get_output_handle(name) + output_data = output_tensor.copy_to_cpu() + results.append(output_data) + + return results + +def parse_args(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model_file", + type=str, + default="./inference.pdmodel", + help="Model filename, Specify this when your model is a combined model." + ) + parser.add_argument( + "--params_file", + type=str, + default="./inference.pdiparams", + help= + "Parameter filename, Specify this when your model is a combined model." + ) + parser.add_argument( + "--model_dir", + type=str, + default="", + help= + "Model dir, If you load a non-combined model, specify the directory of the model." + ) + parser.add_argument( + '--input_path', + dest='input_path', + help='The image path', + type=str, + required=True) + parser.add_argument( + '--output_path', + dest='output_path', + help='The result path of image', + type=str, + required=True) + parser.add_argument("--use_gpu", + type=int, + default=0, + help="Whether use gpu.") + return parser.parse_args() + +if __name__ == '__main__': + args = parse_args() + pred = init_predictor(args) + K = np.array([[[2055.56, 0, 939.658], [0, 2055.56, 641.072], [0, 0, 1]]], np.float32) + K_inverse = np.linalg.inv(K) + + img_path = args.input_path + img, ori_img_size, output_size = get_img(img_path) + ratio = get_ratio(ori_img_size, output_size) + + results = run(pred, [img, K_inverse, ratio]) + + total_pred = paddle.to_tensor(results[0]) + + keep_idx = paddle.nonzero(total_pred[:, -1] > 0.25) + total_pred = paddle.gather(total_pred, keep_idx) + + if total_pred.shape[0] > 0: + pred_dimensions = total_pred[:, 6:9] + pred_dimensions = pred_dimensions.roll(shifts=1, axis=1) + pred_rotys = total_pred[:, 12] + pred_locations = total_pred[:, 9:12] + bbox_3d = encode_box3d(pred_rotys, pred_dimensions, pred_locations, paddle.to_tensor(K), (1280, 1920)) + else: + bbox_3d = total_pred + + + img_draw = cv2.imread(img_path) + for idx in range(bbox_3d.shape[0]): + bbox = bbox_3d[idx] + bbox = bbox.transpose([1,0]).numpy() + img_draw = draw_box_3d(img_draw, bbox) + + cv2.imwrite(args.output_path, img_draw) + diff --git a/PaddleCV/3d_vision/SMOKE/docs/paddle.png b/PaddleCV/3d_vision/SMOKE/docs/paddle.png new file mode 100644 index 0000000000000000000000000000000000000000..1e349216564430725e440751f26c02d95493057e Binary files /dev/null and b/PaddleCV/3d_vision/SMOKE/docs/paddle.png differ diff --git a/PaddleCV/3d_vision/SMOKE/examples/0615037.png b/PaddleCV/3d_vision/SMOKE/examples/0615037.png new file mode 100644 index 0000000000000000000000000000000000000000..39d36c8a88aba19b4d5681819b712df52df23da8 Binary files /dev/null and b/PaddleCV/3d_vision/SMOKE/examples/0615037.png differ diff --git a/PaddleCV/3d_vision/SMOKE/pretrained/README.md b/PaddleCV/3d_vision/SMOKE/pretrained/README.md new file mode 100644 index 0000000000000000000000000000000000000000..09f4035f319917c5814fc70526b0718ae6663eb2 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/pretrained/README.md @@ -0,0 +1 @@ +put the pretrained model here. \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/pretrained/download.py b/PaddleCV/3d_vision/SMOKE/pretrained/download.py new file mode 100644 index 0000000000000000000000000000000000000000..46a07a2b6b0753beac29faa7763cd61bad6c18f1 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/pretrained/download.py @@ -0,0 +1,30 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os +from io import BytesIO +import urllib.request +from zipfile import ZipFile + +LOCAL_PATH = os.path.dirname(os.path.abspath(__file__)) + +if __name__ == "__main__": + file_url = "https://bj.bcebos.com/paddleseg/3d/smoke/dla34.pdparams" + urllib.request.urlretrieve(file_url, "dla34.pdparams") + + smoke_model_path = 'https://bj.bcebos.com/paddleseg/3d/smoke/smoke-release.zip' + with urllib.request.urlopen(smoke_model_path) as zipresp: + with ZipFile(BytesIO(zipresp.read())) as zfile: + zfile.extractall(LOCAL_PATH) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/requirement.txt b/PaddleCV/3d_vision/SMOKE/requirement.txt new file mode 100644 index 0000000000000000000000000000000000000000..241e7992e1ff66d93fbb0db10d2b05674b232eb6 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/requirement.txt @@ -0,0 +1,5 @@ +visualdl +opencv-python +scikit-image +filelock +tqdm \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..218a37594b71bb4769e25dea2da7d296d8222779 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import models, datasets, transforms \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/core/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/core/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1150dad63907c7757eed613dba6455a6f48acc25 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/core/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .train import train +from .val import evaluate +from .kitti_eval import kitti_evaluation \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/core/kitti_eval.py b/PaddleCV/3d_vision/SMOKE/smoke/core/kitti_eval.py new file mode 100644 index 0000000000000000000000000000000000000000..b9a1e02a0a39e8a3fb60978cb7fe2c88d087894c --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/core/kitti_eval.py @@ -0,0 +1,91 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import csv +import logging +import subprocess +import shutil + +from smoke.utils.miscellaneous import mkdir + +def kitti_evaluation(dataset, predictions, output_dir): + """Do evaluation by process kitti eval program + + Args: + dataset (paddle.io.Dataset): [description] + predictions (Paddle.Tensor): [description] + output_dir (str): path of save prediction + """ + # Clear data dir before do evaluate + if os.path.exists(os.path.join(output_dir, 'data')): + shutil.rmtree(os.path.join(output_dir, 'data')) + predict_folder = os.path.join(output_dir, 'data') # only recognize data + mkdir(predict_folder) + type_id_conversion = getattr(dataset, 'TYPE_ID_CONVERSION') + id_type_conversion = {value:key for key, value in type_id_conversion.items()} + for image_id, prediction in predictions.items(): + predict_txt = image_id + '.txt' + predict_txt = os.path.join(predict_folder, predict_txt) + + generate_kitti_3d_detection(prediction, predict_txt, id_type_conversion) + + output_dir = os.path.abspath(output_dir) + root_dir = os.getcwd() + os.chdir('./tools/kitti_eval_offline') + label_dir = getattr(dataset, 'label_dir') + label_dir = os.path.join(root_dir, label_dir) + + if not os.path.isfile('evaluate_object_3d_offline'): + subprocess.Popen('g++ -O3 -DNDEBUG -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp', shell=True) + command = "./evaluate_object_3d_offline {} {}".format(label_dir, output_dir) + + os.system(command) + +def generate_kitti_3d_detection(prediction, predict_txt, id_type_conversion): + """write kitti 3d detection result to txt file + + Args: + prediction (list[float]): final prediction result + predict_txt (str): path to save the result + """ + with open(predict_txt, 'w', newline='') as f: + w = csv.writer(f, delimiter=' ', lineterminator='\n') + if len(prediction) == 0: + w.writerow([]) + else: + for p in prediction: + p = p.round(4) + type = id_type_conversion[int(p[0])] + row = [type, 0, 0] + p[1:].tolist() + w.writerow(row) + + check_last_line_break(predict_txt) + +def check_last_line_break(predict_txt): + """check predict last lint + + Args: + predict_txt (str): path of predict txt + """ + f = open(predict_txt, 'rb+') + try: + f.seek(-1, os.SEEK_END) + except: + pass + else: + if f.__next__() == b'\n': + f.seek(-1, os.SEEK_END) + f.truncate() + f.close() \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/core/train.py b/PaddleCV/3d_vision/SMOKE/smoke/core/train.py new file mode 100644 index 0000000000000000000000000000000000000000..de8535f198df9cf16dc1a24ab7dad21f0ee9b7fb --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/core/train.py @@ -0,0 +1,195 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/core/train.py +""" + +import os +import time +from collections import deque +import shutil + +import paddle +import paddle.nn.functional as F +from visualdl import LogWriter + +from smoke.utils import TimeAverager, calculate_eta, logger + +def train(model, + train_dataset, + val_dataset=None, + optimizer=None, + loss_computation=None, + save_dir='output', + iters=10000, + batch_size=2, + resume_model=None, + save_interval=1000, + log_iters=10, + num_workers=0, + keep_checkpoint_max=5): + """ + Launch training. + + Args: + model(nn.Layer): A sementic segmentation model. + train_dataset (paddle.io.Dataset): Used to read and process training datasets. + val_dataset (paddle.io.Dataset, optional): Used to read and process validation datasets. + optimizer (paddle.optimizer.Optimizer): The optimizer. + loss_computation (nn.Layer): A loss function. + save_dir (str, optional): The directory for saving the model snapshot. Default: 'output'. + iters (int, optional): How may iters to train the model. Defualt: 10000. + batch_size (int, optional): Mini batch size of one gpu or cpu. Default: 2. + resume_model (str, optional): The path of resume model. + save_interval (int, optional): How many iters to save a model snapshot once during training. Default: 1000. + log_iters (int, optional): Display logging information at every log_iters. Default: 10. + num_workers (int, optional): Num workers for data loader. Default: 0. + keep_checkpoint_max (int, optional): Maximum number of checkpoints to save. Default: 5. + """ + model.train() + nranks = paddle.distributed.ParallelEnv().nranks + local_rank = paddle.distributed.ParallelEnv().local_rank + + start_iter = 0 + if resume_model is not None: + start_iter = resume(model, optimizer, resume_model) + + if not os.path.isdir(save_dir): + if os.path.exists(save_dir): + os.remove(save_dir) + os.makedirs(save_dir) + + if nranks > 1: + # Initialize parallel environment if not done. + if not paddle.distributed.parallel.parallel_helper._is_parallel_ctx_initialized( + ): + paddle.distributed.init_parallel_env() + ddp_model = paddle.DataParallel(model) + else: + ddp_model = paddle.DataParallel(model) + + batch_sampler = paddle.io.DistributedBatchSampler( + train_dataset, batch_size=batch_size, shuffle=True, drop_last=True) + + loader = paddle.io.DataLoader( + train_dataset, + batch_sampler=batch_sampler, + num_workers=num_workers, + return_list=True, + ) + + # VisualDL log + log_writer = LogWriter(save_dir) + + avg_loss = 0.0 + avg_loss_dict = {} + iters_per_epoch = len(batch_sampler) + + reader_cost_averager = TimeAverager() + batch_cost_averager = TimeAverager() + save_models = deque() + batch_start = time.time() + + iter = start_iter + while iter < iters: + for data in loader: + iter += 1 + if iter > iters: + break + reader_cost_averager.record(time.time() - batch_start) + images = data[0] + targets = data[1] + + if nranks > 1: + predictions = ddp_model(images) + else: + predictions = model(images) + + loss_dict = loss_computation(predictions, targets) + loss = sum(loss for loss in loss_dict.values()) + loss.backward() + + optimizer.step() + lr = optimizer.get_lr() + if isinstance(optimizer._learning_rate, + paddle.optimizer.lr.LRScheduler): + optimizer._learning_rate.step() + model.clear_gradients() + avg_loss += loss.numpy()[0] # get the value + if len(avg_loss_dict) == 0: + avg_loss_dict = {k:v.numpy()[0] for k, v in loss_dict.items()} + else: + for key, value in loss_dict.items(): + avg_loss_dict[key] += value.numpy()[0] + + batch_cost_averager.record( + time.time() - batch_start, num_samples=batch_size) + + if (iter) % log_iters == 0 and local_rank == 0: + avg_loss /= log_iters + for key, value in avg_loss_dict.items(): + avg_loss_dict[key] /= log_iters + + remain_iters = iters - iter + avg_train_batch_cost = batch_cost_averager.get_average() + avg_train_reader_cost = reader_cost_averager.get_average() + eta = calculate_eta(remain_iters, avg_train_batch_cost) + logger.info( + "[TRAIN] epoch={}, iter={}/{}, loss={:.4f}, lr={:.6f}, batch_cost={:.4f}, reader_cost={:.5f} | ETA {}" + .format((iter - 1) // iters_per_epoch + 1, iter, iters, + avg_loss, lr, avg_train_batch_cost, + avg_train_reader_cost, eta)) + + ######################### VisualDL Log ########################## + log_writer.add_scalar('Train/loss', avg_loss, iter) + # Record all losses if there are more than 2 losses. + for key, value in avg_loss_dict.items(): + log_tag = 'Train/' + key + log_writer.add_scalar(log_tag, value, iter) + + log_writer.add_scalar('Train/lr', lr, iter) + log_writer.add_scalar('Train/batch_cost', + avg_train_batch_cost, iter) + log_writer.add_scalar('Train/reader_cost', + avg_train_reader_cost, iter) + ################################################################# + + avg_loss = 0.0 + avg_loss_list = {} + reader_cost_averager.reset() + batch_cost_averager.reset() + + if (iter % save_interval == 0 or iter == iters) and local_rank == 0: + current_save_dir = os.path.join(save_dir, + "iter_{}".format(iter)) + if not os.path.isdir(current_save_dir): + os.makedirs(current_save_dir) + paddle.save(model.state_dict(), + os.path.join(current_save_dir, 'model.pdparams')) + paddle.save(optimizer.state_dict(), + os.path.join(current_save_dir, 'model.pdopt')) + save_models.append(current_save_dir) + if len(save_models) > keep_checkpoint_max > 0: + model_to_remove = save_models.popleft() + shutil.rmtree(model_to_remove) + + + batch_start = time.time() + + + # Sleep for half a second to let dataloader release resources. + time.sleep(0.5) + log_writer.close() \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/core/val.py b/PaddleCV/3d_vision/SMOKE/smoke/core/val.py new file mode 100644 index 0000000000000000000000000000000000000000..4ba3906e440a2671e7b72d0e8194a7fbc0da956c --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/core/val.py @@ -0,0 +1,96 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/core/train.py +""" + +import os +import time + +import numpy as np +import paddle +import paddle.nn.functional as F + +from smoke.utils import TimeAverager, calculate_eta, logger, progbar +from .kitti_eval import kitti_evaluation + + +def evaluate(model, + eval_dataset, + num_workers=0, + output_dir="./output", + print_detail=True): + """ + Launch evalution. + + Args: + model(nn.Layer): A model. + eval_dataset (paddle.io.Dataset): Used to read and process validation datasets. + num_workers (int, optional): Num workers for data loader. Default: 0. + print_detail (bool, optional): Whether to print detailed information about the evaluation process. Default: True. + + Returns: + float: The mIoU of validation datasets. + float: The accuracy of validation datasets. + """ + model.eval() + + batch_sampler = paddle.io.BatchSampler( + eval_dataset, batch_size=1, shuffle=False, drop_last=False) + loader = paddle.io.DataLoader( + eval_dataset, + batch_sampler=batch_sampler, + num_workers=num_workers, + return_list=True, + ) + + total_iters = len(loader) + + if print_detail: + logger.info( + "Start evaluating (total_samples={}, total_iters={})...".format( + len(eval_dataset), total_iters)) + progbar_val = progbar.Progbar(target=total_iters, verbose=1) + reader_cost_averager = TimeAverager() + batch_cost_averager = TimeAverager() + batch_start = time.time() + predictions = {} + with paddle.no_grad(): + for cur_iter, batch in enumerate(loader): + reader_cost_averager.record(time.time() - batch_start) + images, targets, image_ids = batch[0], batch[1], batch[2] + + output = model(images, targets) + + output = output.numpy() + predictions.update( + {img_id: output for img_id in image_ids}) + + batch_cost_averager.record( + time.time() - batch_start, num_samples=len(targets)) + batch_cost = batch_cost_averager.get_average() + reader_cost = reader_cost_averager.get_average() + + if print_detail: + progbar_val.update(cur_iter + 1, [('batch_cost', batch_cost), + ('reader cost', reader_cost)]) + reader_cost_averager.reset() + batch_cost_averager.reset() + batch_start = time.time() + + kitti_evaluation(eval_dataset, predictions, output_dir=output_dir) + + diff --git a/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..09b1d74202d2befdc9f5106cfa3605c227c49208 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from . import manager +from . import param_init +from .config import Config \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/config.py b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/config.py new file mode 100644 index 0000000000000000000000000000000000000000..a7568f4f4f4a118339a3ceaf18239e2f5ee08f2c --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/config.py @@ -0,0 +1,245 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/cvlibs/config.py +""" + +import codecs +import os +from typing import Any, Dict, Generic + +import paddle +import yaml + +from smoke.cvlibs import manager +from smoke.utils import logger + +class Config(object): + ''' + Training configuration parsing. The only yaml/yml file is supported. + + The following hyper-parameters are available in the config file: + batch_size: The number of samples per gpu. + iters: The total training steps. + train_dataset: A training data config including type/data_root/transforms/mode. + For data type, please refer to paddleseg.datasets. + For specific transforms, please refer to paddleseg.transforms.transforms. + val_dataset: A validation data config including type/data_root/transforms/mode. + optimizer: A optimizer config, but currently PaddleSeg only supports sgd with momentum in config file. + In addition, weight_decay could be set as a regularization. + learning_rate: A learning rate config. If decay is configured, learning _rate value is the starting learning rate, + where only poly decay is supported using the config file. In addition, decay power and end_lr are tuned experimentally. + loss: A loss config. Multi-loss config is available. The loss type order is consistent with the seg model outputs, + where the coef term indicates the weight of corresponding loss. Note that the number of coef must be the same as the number of + model outputs, and there could be only one loss type if using the same loss type among the outputs, otherwise the number of + loss type must be consistent with coef. + model: A model config including type/backbone and model-dependent arguments. + For model type, please refer to paddleseg.models. + For backbone, please refer to paddleseg.models.backbones. + + Args: + path (str) : The path of config file, supports yaml format only. + + Examples: + + from paddleseg.cvlibs.config import Config + + # Create a cfg object with yaml file path. + cfg = Config(yaml_cfg_path) + + # Parsing the argument when its property is used. + train_dataset = cfg.train_dataset + + # the argument of model should be parsed after dataset, + # since the model builder uses some properties in dataset. + model = cfg.model + ... + ''' + + def __init__(self, + path: str, + learning_rate: float = None, + batch_size: int = None, + iters: int = None): + if not path: + raise ValueError('Please specify the configuration file path.') + + if not os.path.exists(path): + raise FileNotFoundError('File {} does not exist'.format(path)) + + self._model = None + + if path.endswith('yml') or path.endswith('yaml'): + self.dic = self._parse_from_yaml(path) + else: + raise RuntimeError('Config file should in yaml format!') + + self.update( + learning_rate=learning_rate, batch_size=batch_size, iters=iters) + + def _update_dic(self, dic, base_dic): + """ + Update config from dic based base_dic + """ + base_dic = base_dic.copy() + for key, val in dic.items(): + if isinstance(val, dict) and key in base_dic: + base_dic[key] = self._update_dic(val, base_dic[key]) + else: + base_dic[key] = val + dic = base_dic + return dic + + def _parse_from_yaml(self, path: str): + '''Parse a yaml file and build config''' + with codecs.open(path, 'r', 'utf-8') as file: + dic = yaml.load(file, Loader=yaml.FullLoader) + + if '_base_' in dic: + cfg_dir = os.path.dirname(path) + base_path = dic.pop('_base_') + base_path = os.path.join(cfg_dir, base_path) + base_dic = self._parse_from_yaml(base_path) + dic = self._update_dic(dic, base_dic) + return dic + + def update(self, + learning_rate: float = None, + batch_size: int = None, + iters: int = None): + '''Update config''' + if learning_rate: + self.dic['lr_scheduler']['learning_rate'] = learning_rate + + if batch_size: + self.dic['batch_size'] = batch_size + + if iters: + self.dic['iters'] = iters + + @property + def batch_size(self): + return self.dic.get('batch_size', 1) + + @property + def iters(self): + iters = self.dic.get('iters') + if not iters: + raise RuntimeError('No iters specified in the configuration file.') + return iters + + + @property + def train_dataset(self): + train_dataset_cfg = self.dic.get('train_dataset', {}) + if not train_dataset_cfg: + return None + return self._load_object(train_dataset_cfg) + + @property + def val_dataset(self): + val_dataset_cfg = self.dic.get('val_dataset', {}) + if not val_dataset_cfg: + return None + return self._load_object(val_dataset_cfg) + + @property + def model(self): + model_cfg = self.dic.get('model').copy() + if not model_cfg: + raise RuntimeError('No model specified in the configuration file.') + + if not self._model: + self._model = self._load_object(model_cfg) + return self._model + + @property + def lr_scheduler(self) -> paddle.optimizer.lr.LRScheduler: + if 'lr_scheduler' not in self.dic: + raise RuntimeError( + 'No `lr_scheduler` specified in the configuration file.') + params = self.dic.get('lr_scheduler').copy() + if 'type' not in params.keys(): + if "learning_rate" in params.keys(): + logger.warning(''' No decay config! The fixed learning rate will be used''') + return params["learning_rate"] + else: + raise RuntimeError( + '`lr_scheduler` is not set properlly in the configuration file.') + + lr_type = params.pop('type') + + return getattr(paddle.optimizer.lr, lr_type)(**params) + + @property + def optimizer(self): + if 'lr_scheduler' in self.dic: + lr = self.lr_scheduler + else: + lr = self.learning_rate + args = self.dic.get('optimizer', {}).copy() + optimizer_type = args.pop('type') + + return getattr(paddle.optimizer, optimizer_type)(lr, parameters=self.model.parameters(), **args) + + @property + def loss(self): + loss_cfg = self.dic.get('loss', {}).copy() + if not loss_cfg: + return None + return self._load_object(loss_cfg) + + def _load_component(self, com_name): + com_list = [ + manager.MODELS, manager.BACKBONES, manager.DATASETS, + manager.TRANSFORMS, manager.LOSSES, manager.HEADS, + manager.POSTPROCESSORS + ] + + for com in com_list: + if com_name in com.components_dict: + return com[com_name] + else: + raise RuntimeError( + 'The specified component was not found {}.'.format(com_name)) + + def _load_object(self, cfg): + cfg = cfg.copy() + if 'type' not in cfg: + raise RuntimeError('No object information in {}.'.format(cfg)) + + component = self._load_component(cfg.pop('type')) + + params = {} + for key, val in cfg.items(): + if self._is_meta_type(val): + params[key] = self._load_object(val) + elif isinstance(val, list): + params[key] = [ + self._load_object(item) + if self._is_meta_type(item) else item for item in val + ] + else: + params[key] = val + + return component(**params) + + + def _is_meta_type(self, item): + return isinstance(item, dict) and 'type' in item + + def __str__(self): + return yaml.dump(self.dic) diff --git a/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/manager.py b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/manager.py new file mode 100644 index 0000000000000000000000000000000000000000..03701ee87f583f8dcb19fdb6979c8e293d6a9728 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/manager.py @@ -0,0 +1,151 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +This file is modified from PaddleSeg: +https://github.com/PaddlePaddle/PaddleSeg/blob/release/v2.0/paddleseg/cvlibs/manager.py + +""" + +import inspect +from collections.abc import Sequence + + +class ComponentManager: + """ + Implement a manager class to add the new component properly. + The component can be added as either class or function type. + + Args: + name (str): The name of component. + + Returns: + A callable object of ComponentManager. + + Examples 1: + + from paddleseg.cvlibs.manager import ComponentManager + + model_manager = ComponentManager() + + class AlexNet: ... + class ResNet: ... + + model_manager.add_component(AlexNet) + model_manager.add_component(ResNet) + + # Or pass a sequence alliteratively: + model_manager.add_component([AlexNet, ResNet]) + print(model_manager.components_dict) + # {'AlexNet': , 'ResNet': } + + Examples 2: + + # Or an easier way, using it as a Python decorator, while just add it above the class declaration. + from paddleseg.cvlibs.manager import ComponentManager + + model_manager = ComponentManager() + + @model_manager.add_component + class AlexNet: ... + + @model_manager.add_component + class ResNet: ... + + print(model_manager.components_dict) + # {'AlexNet': , 'ResNet': } + """ + + def __init__(self, name=None): + self._components_dict = dict() + self._name = name + + def __len__(self): + return len(self._components_dict) + + def __repr__(self): + name_str = self._name if self._name else self.__class__.__name__ + return "{}:{}".format(name_str, list(self._components_dict.keys())) + + def __getitem__(self, item): + if item not in self._components_dict.keys(): + raise KeyError("{} does not exist in availabel {}".format( + item, self)) + return self._components_dict[item] + + @property + def components_dict(self): + return self._components_dict + + @property + def name(self): + return self._name + + def _add_single_component(self, component): + """ + Add a single component into the corresponding manager. + + Args: + component (function|class): A new component. + + Raises: + TypeError: When `component` is neither class nor function. + KeyError: When `component` was added already. + """ + + # Currently only support class or function type + if not (inspect.isclass(component) or inspect.isfunction(component)): + raise TypeError( + "Expect class/function type, but received {}".format( + type(component))) + + # Obtain the internal name of the component + component_name = component.__name__ + + # Check whether the component was added already + if component_name in self._components_dict.keys(): + raise KeyError("{} exists already!".format(component_name)) + else: + # Take the internal name of the component as its key + self._components_dict[component_name] = component + + def add_component(self, components): + """ + Add component(s) into the corresponding manager. + + Args: + components (function|class|list|tuple): Support four types of components. + + Returns: + components (function|class|list|tuple): Same with input components. + """ + + # Check whether the type is a sequence + if isinstance(components, Sequence): + for component in components: + self._add_single_component(component) + else: + component = components + self._add_single_component(component) + + return components + + +MODELS = ComponentManager("models") +BACKBONES = ComponentManager("backbones") +HEADS = ComponentManager("heads") +POSTPROCESSORS = ComponentManager("post_processors") +DATASETS = ComponentManager("datasets") +TRANSFORMS = ComponentManager("transforms") +LOSSES = ComponentManager("losses") diff --git a/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/param_init.py b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/param_init.py new file mode 100644 index 0000000000000000000000000000000000000000..a83bf483c7d693866452fd14038a012238785df0 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/cvlibs/param_init.py @@ -0,0 +1,96 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/cvlibs/param_init.py +""" + +import paddle.nn as nn + + +def constant_init(param, **kwargs): + """ + Initialize the `param` with constants. + + Args: + param (Tensor): Tensor that needs to be initialized. + + Examples: + + from paddleseg.cvlibs import param_init + import paddle.nn as nn + + linear = nn.Linear(2, 4) + param_init.constant_init(linear.weight, value=2.0) + print(linear.weight.numpy()) + # result is [[2. 2. 2. 2.], [2. 2. 2. 2.]] + + """ + initializer = nn.initializer.Constant(**kwargs) + initializer(param, param.block) + + +def normal_init(param, **kwargs): + """ + Initialize the `param` with a Normal distribution. + + Args: + param (Tensor): Tensor that needs to be initialized. + + Examples: + + from paddleseg.cvlibs import param_init + import paddle.nn as nn + + linear = nn.Linear(2, 4) + param_init.normal_init(linear.weight, loc=0.0, scale=1.0) + + """ + initializer = nn.initializer.Normal(**kwargs) + initializer(param, param.block) + + +def kaiming_normal_init(param, **kwargs): + """ + Initialize the input tensor with Kaiming Normal initialization. + + This function implements the `param` initialization from the paper + `Delving Deep into Rectifiers: Surpassing Human-Level Performance on + ImageNet Classification ` + by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. This is a + robust initialization method that particularly considers the rectifier + nonlinearities. In case of Uniform distribution, the range is [-x, x], where + .. math:: + x = \sqrt{\\frac{6.0}{fan\_in}} + In case of Normal distribution, the mean is 0 and the standard deviation + is + .. math:: + \sqrt{\\frac{2.0}{fan\_in}} + + Args: + param (Tensor): Tensor that needs to be initialized. + + Examples: + + from paddleseg.cvlibs import param_init + import paddle.nn as nn + + linear = nn.Linear(2, 4) + # uniform is used to decide whether to use uniform or normal distribution + param_init.kaiming_normal_init(linear.weight) + + """ + initializer = nn.initializer.KaimingNormal(**kwargs) + initializer(param, param.block) diff --git a/PaddleCV/3d_vision/SMOKE/smoke/datasets/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..48460e7221dc0770ef40abc9c4aeec36a237fe4f --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/datasets/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .kitti import KITTI \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/datasets/kitti.py b/PaddleCV/3d_vision/SMOKE/smoke/datasets/kitti.py new file mode 100644 index 0000000000000000000000000000000000000000..8087dfbaa0e24866865cffab531011213bfe58b2 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/datasets/kitti.py @@ -0,0 +1,288 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import csv +import logging +import random + +import paddle +import numpy as np +from PIL import Image + +from smoke.cvlibs import manager +from smoke.transforms import Compose + +from smoke.utils.heatmap_coder import ( + get_transfrom_matrix, + affine_transform, + gaussian_radius, + draw_umich_gaussian, + encode_label +) + + +@manager.DATASETS.add_component +class KITTI(paddle.io.Dataset): + """Parsing KITTI format dataset + + Args: + Dataset (class): + """ + def __init__(self, dataset_root, mode="train", transforms=None, flip_prob=0.5, aug_prob=0.3): + super().__init__() + + self.TYPE_ID_CONVERSION = { + 'Car': 0, + 'Cyclist': 1, + 'Pedestrian': 2, + } + + mode = mode.lower() + + self.image_dir = os.path.join(dataset_root, "image_2") + self.label_dir = os.path.join(dataset_root, "label_2") + self.calib_dir = os.path.join(dataset_root, "calib") + + if mode.lower() not in ['train', 'val', 'trainval', 'test']: + raise ValueError( + "mode should be 'train', 'val', 'trainval' or 'test', but got {}.".format( + mode)) + imageset_txt = os.path.join(dataset_root, "ImageSets", "{}.txt".format(mode)) + + self.is_train = True if mode in ["train", "trainval"] else False + self.transforms = Compose(transforms) + + image_files = [] + for line in open(imageset_txt, "r"): + base_name = line.replace("\n", "") + image_name = base_name + ".png" + image_files.append(image_name) + self.image_files = image_files + self.label_files = [i.replace(".png", ".txt") for i in self.image_files] + self.num_samples = len(self.image_files) + self.classes = ("Car", "Cyclist", "Pedestrian") + + + self.flip_prob = flip_prob if self.is_train else 0.0 + self.aug_prob = aug_prob if self.is_train else 0.0 + self.shift_scale = (0.2, 0.4) + self.num_classes = len(self.classes) + + self.input_width = 1280 + self.input_height = 384 + self.output_width = self.input_width // 4 + self.output_height = self.input_height // 4 + self.max_objs = 50 + + self.logger = logging.getLogger(__name__) + self.logger.info("Initializing KITTI {} set with {} files loaded".format(mode, self.num_samples)) + + def __len__(self): + return self.num_samples + + def __getitem__(self, idx): + # load default parameter here + original_idx = self.label_files[idx].replace(".txt", "") + img_path = os.path.join(self.image_dir, self.image_files[idx]) + img = Image.open(img_path) + anns, K = self.load_annotations(idx) + + center = np.array([i / 2 for i in img.size], dtype=np.float32) + size = np.array([i for i in img.size], dtype=np.float32) + + """ + resize, horizontal flip, and affine augmentation are performed here. + since it is complicated to compute heatmap w.r.t transform. + """ + flipped = False + if (self.is_train) and (random.random() < self.flip_prob): + flipped = True + img = img.transpose(Image.FLIP_LEFT_RIGHT) + center[0] = size[0] - center[0] - 1 + K[0, 2] = size[0] - K[0, 2] - 1 + + affine = False + if (self.is_train) and (random.random() < self.aug_prob): + affine = True + shift, scale = self.shift_scale[0], self.shift_scale[1] + shift_ranges = np.arange(-shift, shift + 0.1, 0.1) + center[0] += size[0] * random.choice(shift_ranges) + center[1] += size[1] * random.choice(shift_ranges) + + scale_ranges = np.arange(1 - scale, 1 + scale + 0.1, 0.1) + size *= random.choice(scale_ranges) + + center_size = [center, size] + trans_affine = get_transfrom_matrix( + center_size, + [self.input_width, self.input_height] + ) + trans_affine_inv = np.linalg.inv(trans_affine) + img = img.transform( + (self.input_width, self.input_height), + method=Image.AFFINE, + data=trans_affine_inv.flatten()[:6], + resample=Image.BILINEAR, + ) + + + trans_mat = get_transfrom_matrix( + center_size, + [self.output_width, self.output_height] + ) + + if not self.is_train: + # for inference we parametrize with original size + target = {} + target["image_size"] = size + target["is_train"] = self.is_train + target["trans_mat"] = trans_mat + target["K"] = K + if self.transforms is not None: + img, target = self.transforms(img, target) + + return np.array(img), target, original_idx + + heat_map = np.zeros([self.num_classes, self.output_height, self.output_width], dtype=np.float32) + regression = np.zeros([self.max_objs, 3, 8], dtype=np.float32) + cls_ids = np.zeros([self.max_objs], dtype=np.int32) + proj_points = np.zeros([self.max_objs, 2], dtype=np.int32) + p_offsets = np.zeros([self.max_objs, 2], dtype=np.float32) + c_offsets = np.zeros([self.max_objs, 2], dtype=np.float32) + dimensions = np.zeros([self.max_objs, 3], dtype=np.float32) + locations = np.zeros([self.max_objs, 3], dtype=np.float32) + rotys = np.zeros([self.max_objs], dtype=np.float32) + reg_mask = np.zeros([self.max_objs], dtype=np.uint8) + flip_mask = np.zeros([self.max_objs], dtype=np.uint8) + bbox2d_size = np.zeros([self.max_objs, 2], dtype=np.float32) + + for i, a in enumerate(anns): + if i == self.max_objs: + break + a = a.copy() + cls = a["label"] + + locs = np.array(a["locations"]) + rot_y = np.array(a["rot_y"]) + if flipped: + locs[0] *= -1 + rot_y *= -1 + + point, box2d, box3d = encode_label( + K, rot_y, a["dimensions"], locs + ) + if np.all(box2d == 0): + continue + point = affine_transform(point, trans_mat) + box2d[:2] = affine_transform(box2d[:2], trans_mat) + box2d[2:] = affine_transform(box2d[2:], trans_mat) + box2d[[0, 2]] = box2d[[0, 2]].clip(0, self.output_width - 1) + box2d[[1, 3]] = box2d[[1, 3]].clip(0, self.output_height - 1) + h, w = box2d[3] - box2d[1], box2d[2] - box2d[0] + center = np.array([(box2d[0] + box2d[2]) / 2, (box2d[1] + box2d[3]) /2], dtype=np.float32) + + if (0 < center[0] < self.output_width) and (0 < center[1] < self.output_height): + point_int = center.astype(np.int32) + p_offset = point - point_int + c_offset = center - point_int + radius = gaussian_radius(h, w) + radius = max(0, int(radius)) + heat_map[cls] = draw_umich_gaussian(heat_map[cls], point_int, radius) + + cls_ids[i] = cls + regression[i] = box3d + proj_points[i] = point_int + p_offsets[i] = p_offset + c_offsets[i] = c_offset + dimensions[i] = np.array(a["dimensions"]) + locations[i] = locs + rotys[i] = rot_y + reg_mask[i] = 1 if not affine else 0 + flip_mask[i] = 1 if not affine and flipped else 0 + + # targets for 2d bbox + bbox2d_size[i, 0] = w + bbox2d_size[i, 1] = h + + target = {} + target["image_size"] = np.array(img.size) + target["is_train"] = self.is_train + target["trans_mat"] = trans_mat + target["K"] = K + target["hm"] = heat_map + target["reg"] = regression + target["cls_ids"] = cls_ids + target["proj_p"] = proj_points + target["dimensions"] = dimensions + target["locations"] = locations + target["rotys"] = rotys + target["reg_mask"] = reg_mask + target["flip_mask"] = flip_mask + target["bbox_size"] = bbox2d_size + target["c_offsets"] = c_offsets + + if self.transforms is not None: + img, target = self.transforms(img, target) + + + return np.array(img), target, original_idx + + + def load_annotations(self, idx): + """load kitti label by given index + + Args: + idx (int): which label to load + + Returns: + (list[dict], np.ndarray(float32, 3x3)): labels and camera intrinsic matrix + """ + annotations = [] + file_name = self.label_files[idx] + fieldnames = ['type', 'truncated', 'occluded', 'alpha', 'xmin', 'ymin', 'xmax', 'ymax', 'dh', 'dw', + 'dl', 'lx', 'ly', 'lz', 'ry'] + + if self.is_train: + if os.path.exists(os.path.join(self.label_dir, file_name)): + with open(os.path.join(self.label_dir, file_name), 'r') as csv_file: + reader = csv.DictReader(csv_file, delimiter=' ', fieldnames=fieldnames) + + for line, row in enumerate(reader): + if (float(row["xmax"]) == 0.) | (float(row["ymax"]) == 0.): + continue + if row["type"] in self.classes: + annotations.append({ + "class": row["type"], + "label": self.TYPE_ID_CONVERSION[row["type"]], + "truncation": float(row["truncated"]), + "occlusion": float(row["occluded"]), + "alpha": float(row["alpha"]), + "dimensions": [float(row['dl']), float(row['dh']), float(row['dw'])], + "locations": [float(row['lx']), float(row['ly']), float(row['lz'])], + "rot_y": float(row["ry"]) + }) + + # get camera intrinsic matrix K + with open(os.path.join(self.calib_dir, file_name), 'r') as csv_file: + reader = csv.reader(csv_file, delimiter=' ') + for line, row in enumerate(reader): + if row[0] == 'P2:': + K = row[1:] + K = [float(i) for i in K] + K = np.array(K, dtype=np.float32).reshape(3, 4) + K = K[:3, :3] + break + + return annotations, K diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..135cc43900e853e2fa7b92b3823aea97c1048a73 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/__init__.py @@ -0,0 +1,20 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .backbones import * +from .losses import * +from .heads import * +from .postprocess import * + +from .smoke import SMOKE \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/backbones/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/models/backbones/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..11ab182407b65d1ad09bb764b68a0ad9cbe7ef37 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/backbones/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .dla import * \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/backbones/dla.py b/PaddleCV/3d_vision/SMOKE/smoke/models/backbones/dla.py new file mode 100644 index 0000000000000000000000000000000000000000..ef1c268ff5a2d222a9db4d6963ea51d3748bb184 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/backbones/dla.py @@ -0,0 +1,521 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +import numpy as np + +from smoke.models.layers import group_norm +from smoke.cvlibs import manager +from smoke.utils import pretrained_utils + +__all__ = [ + "DLA", "DLA34" +] + +@manager.BACKBONES.add_component +class DLA(nn.Layer): + + def __init__(self, + levels, + channels, + block, + down_ratio=4, + last_level=5, + out_channel=0, + norm_type="gn", + pretrained=None): + + super().__init__() + + self.pretrained = pretrained + + assert down_ratio in [2, 4, 8, 16] + + self.first_level = int(np.log2(down_ratio)) + self.last_level = last_level + + norm_func = nn.BatchNorm2D if norm_type == "bn" else group_norm + + self.base = DLABase(levels, + channels, + block=eval(block), + norm_func=norm_func) + + scales = [2 ** i for i in range(len(channels[self.first_level:]))] + self.dla_up = DLAUp(startp=self.first_level, + channels=channels[self.first_level:], + scales=scales, + norm_func=norm_func) + + if out_channel == 0: + out_channel = channels[self.first_level] + + up_scales = [2 ** i for i in range(self.last_level - self.first_level)] + self.ida_up = IDAUp(in_channels=channels[self.first_level:self.last_level], + out_channel=out_channel, + up_f=up_scales, + norm_func=norm_func) + self.init_weight() + + def forward(self, x): + + x = self.base(x) + x = self.dla_up(x) + y = [] + iter_levels = range(self.last_level - self.first_level) + for i in iter_levels: + + y.append(x[i].clone()) + + self.ida_up(y, 0, len(y)) + + return y[-1] + + def init_weight(self): + pretrained_utils.load_pretrained_model(self, self.pretrained) + +class DLABase(nn.Layer): + """DLA base module + """ + def __init__(self, + levels, + channels, + block=None, + residual_root=False, + norm_func=None, + ): + super().__init__() + + self.channels = channels + self.level_length = len(levels) + + if block is None: + block = BasicBlock + if norm_func is None: + norm_func = nn.BatchNorm2d + self.base_layer = nn.Sequential( + nn.Conv2D(3, + channels[0], + kernel_size=7, + stride=1, + padding=3, + bias_attr=False), + norm_func(channels[0]), + nn.ReLU() + ) + + self.level0 = _make_conv_level(in_channels=channels[0], + out_channels=channels[0], + num_convs=levels[0], + norm_func=norm_func) + + self.level1 = _make_conv_level(in_channels=channels[0], + out_channels=channels[1], + num_convs=levels[0], + norm_func=norm_func, + stride=2) + + self.level2 = Tree(level=levels[2], + block=block, + in_channels=channels[1], + out_channels=channels[2], + norm_func=norm_func, + stride=2, + level_root=False, + root_residual=residual_root) + + self.level3 = Tree(level=levels[3], + block=block, + in_channels=channels[2], + out_channels=channels[3], + norm_func=norm_func, + stride=2, + level_root=True, + root_residual=residual_root) + + self.level4 = Tree(level=levels[4], + block=block, + in_channels=channels[3], + out_channels=channels[4], + norm_func=norm_func, + stride=2, + level_root=True, + root_residual=residual_root) + + self.level5 = Tree(level=levels[5], + block=block, + in_channels=channels[4], + out_channels=channels[5], + norm_func=norm_func, + stride=2, + level_root=True, + root_residual=residual_root) + + def forward(self, x): + """forward + """ + y = [] + x = self.base_layer(x) + + for i in range(self.level_length): + x = getattr(self, 'level{}'.format(i))(x) + y.append(x) + + return y + +class DLAUp(nn.Layer): + """DLA Up module + """ + def __init__(self, + startp, + channels, + scales, + in_channels=None, + norm_func=None): + """DLA Up module + """ + super(DLAUp, self).__init__() + + self.startp = startp + if norm_func is None: + norm_func = nn.BatchNorm2d + + if in_channels is None: + in_channels = channels + self.channels = channels + channels = list(channels) + + scales = np.array(scales, dtype=int) + + for i in range(len(channels) - 1): + j = -i - 2 + setattr(self, + 'ida_{}'.format(i), + IDAUp(in_channels[j:], + channels[j], + scales[j:] // scales[j], + norm_func)) + scales[j + 1:] = scales[j] + in_channels[j + 1:] = [channels[j] for _ in channels[j + 1:]] + + def forward(self, layers): + """forward + """ + out = [layers[-1]] # start with 32 + for i in range(len(layers) - self.startp - 1): + ida = getattr(self, 'ida_{}'.format(i)) + ida(layers, len(layers) - i - 2, len(layers)) + out.insert(0, layers[-1]) + return out + +class BasicBlock(nn.Layer): + """Basic Block + """ + def __init__(self, + in_channels, + out_channels, + norm_func, + stride=1, + dilation=1): + super().__init__() + + self.conv1 = nn.Conv2D(in_channels, + out_channels, + kernel_size=3, + stride=stride, + padding=dilation, + bias_attr=False, + dilation=dilation) + self.norm1 = norm_func(out_channels) + + self.relu = nn.ReLU() + + self.conv2 = nn.Conv2D(out_channels, + out_channels, + kernel_size=3, + stride=1, + padding=dilation, + bias_attr=False, + dilation=dilation + ) + self.norm2 = norm_func(out_channels) + + def forward(self, x, residual=None): + """forward + """ + if residual is None: + residual = x + + out = self.conv1(x) + out = self.norm1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.norm2(out) + + out += residual + out = self.relu(out) + + return out + + +class Tree(nn.Layer): + + def __init__(self, + level, + block, + in_channels, + out_channels, + norm_func, + stride=1, + level_root=False, + root_dim=0, + root_kernel_size=1, + dilation=1, + root_residual=False + ): + super(Tree, self).__init__() + + if root_dim == 0: + root_dim = 2 * out_channels + + if level_root: + root_dim += in_channels + + if level == 1: + self.tree1 = block(in_channels, + out_channels, + norm_func, + stride, + dilation=dilation) + + self.tree2 = block(out_channels, + out_channels, + norm_func, + stride=1, + dilation=dilation) + else: + new_level = level - 1 + self.tree1 = Tree(new_level, + block, + in_channels, + out_channels, + norm_func, + stride, + root_dim=0, + root_kernel_size=root_kernel_size, + dilation=dilation, + root_residual=root_residual) + + self.tree2 = Tree(new_level, + block, + out_channels, + out_channels, + norm_func, + root_dim=root_dim + out_channels, + root_kernel_size=root_kernel_size, + dilation=dilation, + root_residual=root_residual) + if level == 1: + self.root = Root(root_dim, + out_channels, + norm_func, + root_kernel_size, + root_residual) + + self.level_root = level_root + self.root_dim = root_dim + self.level = level + + self.downsample = None + if stride > 1: + self.downsample = nn.MaxPool2D(stride, stride=stride) + + self.project = None + if in_channels != out_channels: + self.project = nn.Sequential( + nn.Conv2D(in_channels, + out_channels, + kernel_size=1, + stride=1, + bias_attr=False), + + norm_func(out_channels) + ) + + def forward(self, x, residual=None, children=None): + """forward + """ + if children is None: + children = [] + + if self.downsample: + bottom = self.downsample(x) + else: + bottom = x + + if self.project: + residual = self.project(bottom) + else: + residual = bottom + + if self.level_root: + children.append(bottom) + x1 = self.tree1(x, residual) + + if self.level == 1: + x2 = self.tree2(x1) + x = self.root(x2, x1, *children) + else: + children.append(x1) + x = self.tree2(x1, children=children) + return x + +class Root(nn.Layer): + """Root module + """ + def __init__(self, + in_channels, + out_channels, + norm_func, + kernel_size, + residual): + super(Root, self).__init__() + + self.conv = nn.Conv2D(in_channels, + out_channels, + kernel_size=1, + stride=1, + bias_attr=False, + padding=(kernel_size - 1) // 2) + + self.norm = norm_func(out_channels) + self.relu = nn.ReLU() + self.residual = residual + + def forward(self, *x): + """forward + """ + children = x + x = self.conv(paddle.concat(x, 1)) + x = self.norm(x) + if self.residual: + x += children[0] + x = self.relu(x) + + return x + +class IDAUp(nn.Layer): + """IDAUp module + """ + def __init__(self, + in_channels, + out_channel, + up_f, # todo: what is up_f here? + norm_func): + super().__init__() + + for i in range(1, len(in_channels)): + in_channel = in_channels[i] + f = int(up_f[i]) + + #USE_DEFORMABLE_CONV = False + + # so far only support normal convolution + proj = NormalConv(in_channel, out_channel, norm_func) + node = NormalConv(out_channel, out_channel, norm_func) + + up = nn.Conv2DTranspose(out_channel, + out_channel, + kernel_size=f * 2, + stride=f, + padding=f // 2, + output_padding=0, + groups=out_channel, + bias_attr=False) + # todo: uncommoment later + # _fill_up_weights(up) + + setattr(self, 'proj_' + str(i), proj) + setattr(self, 'up_' + str(i), up) + setattr(self, 'node_' + str(i), node) + + def forward(self, layers, startp, endp): + """forward + """ + for i in range(startp + 1, endp): + + upsample = getattr(self, 'up_' + str(i - startp)) + project = getattr(self, 'proj_' + str(i - startp)) + layers[i] = upsample(project(layers[i])) + node = getattr(self, 'node_' + str(i - startp)) + layers[i] = node(layers[i] + layers[i - 1]) + +class NormalConv(nn.Layer): + """Normal Conv without deformable + """ + def __init__(self, + in_channels, + out_channels, + norm_func): + super(NormalConv, self).__init__() + + self.norm = norm_func(out_channels) + self.relu = nn.ReLU() + self.conv = nn.Conv2D(in_channels, + out_channels, + kernel_size=(3, 3), + padding=1) + + def forward(self, x): + """forward + """ + + x = self.conv(x) + x = self.norm(x) + x = self.relu(x) + + return x + +def _make_conv_level(in_channels, out_channels, num_convs, norm_func, + stride=1, dilation=1): + """ + make conv layers based on its number. + """ + layers = [] + for i in range(num_convs): + layers.extend([ + nn.Conv2D(in_channels, out_channels, kernel_size=3, + stride=stride if i == 0 else 1, + padding=dilation, bias_attr=False, dilation=dilation), + norm_func(out_channels), + nn.ReLU()]) + + in_channels = out_channels + + return nn.Sequential(*layers) + +@manager.BACKBONES.add_component +def DLA34(**kwargs): + + model = DLA( + levels=[1, 1, 1, 2, 2, 1], + channels=[16, 32, 64, 128, 256, 512], + block="BasicBlock", + **kwargs + ) + + return model \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/heads/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/models/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..838f94e49728a49fc153971423e0f63efa3f2fb6 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/heads/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .smoke_predictor import SMOKEPredictor +from .smoke_coder import SMOKECoder \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/heads/smoke_coder.py b/PaddleCV/3d_vision/SMOKE/smoke/models/heads/smoke_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..62f79c0761127745de8f5b139a3ae2387c77ef90 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/heads/smoke_coder.py @@ -0,0 +1,487 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import paddle + +from smoke.ops import gather_op + + +class SMOKECoder(paddle.nn.Layer): + """SMOKE Coder class + """ + def __init__(self, depth_ref, dim_ref): + super().__init__() + + # self.depth_ref = paddle.to_tensor(depth_ref) + # self.dim_ref = paddle.to_tensor(dim_ref) + + self.depth_decoder = DepthDecoder(depth_ref) + self.dimension_decoder = DimensionDecoder(dim_ref) + + + + @staticmethod + def rad_to_matrix(rotys, N): + """decode rotys to R_matrix + + Args: + rotys (Tensor): roty of objects + N (int): num of batch + + Returns: + Tensor: R matrix with shape (N, 3, 3) + R = [[cos(r), 0, sin(r)], [0, 1, 0], [-cos(r), 0, sin(r)]] + """ + + cos, sin = rotys.cos(), rotys.sin() + + i_temp = paddle.to_tensor([[1, 0, 1], [0, 1, 0], [-1, 0, 1]], dtype="float32") + + # ry = paddle.reshape(i_temp.tile([N, 1]), (N, -1, 3)) + + # ry[:, 0, 0] *= cos + # ry[:, 0, 2] *= sin + # ry[:, 2, 0] *= sin + # ry[:, 2, 2] *= cos + + # slice bug, so use concat + pos1 = (paddle.ones([N], dtype="float32") * cos).unsqueeze(-1) + pos2 = (paddle.zeros([N], dtype="float32")).unsqueeze(-1) + pos3 = (paddle.ones([N], dtype="float32") * sin).unsqueeze(-1) + pos4 = (paddle.zeros([N], dtype="float32")).unsqueeze(-1) + pos5 = (paddle.ones([N], dtype="float32")).unsqueeze(-1) + pos6 = (paddle.zeros([N], dtype="float32")).unsqueeze(-1) + pos7 = (paddle.ones([N], dtype="float32") * (-sin)).unsqueeze(-1) + pos8 = (paddle.zeros([N], dtype="float32")).unsqueeze(-1) + pos9 = (paddle.ones([N], dtype="float32") * cos).unsqueeze(-1) + + ry = paddle.concat([pos1, pos2, pos3, pos4, pos5, pos6, pos7, pos8, pos9], axis=1) + + ry = paddle.reshape(ry, [N, 3, 3]) + + return ry + + def encode_box3d(self, rotys, dims, locs): + """ + construct 3d bounding box for each object. + Args: + rotys: rotation in shape N + dims: dimensions of objects + locs: locations of objects + + Returns: + + """ + if len(rotys.shape) == 2: + rotys = rotys.flatten() + if len(dims.shape) == 3: + dims = paddle.reshape(dims, (-1, 3)) + if len(locs.shape) == 3: + locs = paddle.reshape(locs, (-1, 3)) + + N = rotys.shape[0] + ry = self.rad_to_matrix(rotys, N) + + # if test: + # dims.register_hook(lambda grad: print('dims grad', grad.sum())) + # dims = paddle.reshape(dims, (-1, 1)).tile([1, 8]) + + # dims[::3, :4] = 0.5 * dims[::3, :4] + # dims[1::3, :4] = 0. + # dims[2::3, :4] = 0.5 * dims[2::3, :4] + + # dims[::3, 4:] = -0.5 * dims[::3, 4:] + # dims[1::3, 4:] = -dims[1::3, 4:] + # dims[2::3, 4:] = -0.5 * dims[2::3, 4:] + + + dim_left_1 = (0.5 * dims[:, 0]).unsqueeze(-1) + dim_left_2 = paddle.zeros([dims.shape[0], 1]).astype("float32") #(paddle.zeros_like(dims[:, 1])).unsqueeze(-1) + dim_left_3 = (0.5 * dims[:, 2]).unsqueeze(-1) + dim_left = paddle.concat([dim_left_1, dim_left_2, dim_left_3], axis=1) + dim_left = paddle.reshape(dim_left, (-1, 1)).tile([1, 4]) + + dim_right_1 = (-0.5 * dims[:, 0]).unsqueeze(-1) + dim_right_2 = (-dims[:, 1]).unsqueeze(-1) + dim_right_3 = (-0.5 * dims[:, 2]).unsqueeze(-1) + dim_right = paddle.concat([dim_right_1, dim_right_2, dim_right_3], axis=1) + dim_right = paddle.reshape(dim_right, (-1, 1)).tile([1, 4]) + + + dims = paddle.concat([dim_left, dim_right], axis=1) + + + + index = paddle.to_tensor([[4, 0, 1, 2, 3, 5, 6, 7], + [4, 5, 0, 1, 6, 7, 2, 3], + [4, 5, 6, 0, 1, 2, 3, 7]]).tile([N, 1]) + + box_3d_object = gather_op(dims, 1, index) + + box_3d = paddle.matmul(ry, paddle.reshape(box_3d_object, (N, 3, -1))) + # box_3d += locs.unsqueeze(-1).repeat(1, 1, 8) + box_3d += locs.unsqueeze(-1).tile((1, 1, 8)) + + return box_3d + + def decode_depth(self, depths_offset): + """ + Transform depth offset to depth + """ + #depth = depths_offset * self.depth_ref[1] + self.depth_ref[0] + + #return depth + return self.depth_decoder(depths_offset) + + def decode_location(self, + points, + points_offset, + depths, + Ks, + trans_mats): + """ + retrieve objects location in camera coordinate based on projected points + Args: + points: projected points on feature map in (x, y) + points_offset: project points offset in (delata_x, delta_y) + depths: object depth z + Ks: camera intrinsic matrix, shape = [N, 3, 3] + trans_mats: transformation matrix from image to feature map, shape = [N, 3, 3] + + Returns: + locations: objects location, shape = [N, 3] + """ + + # number of points + N = points_offset.shape[0] + # batch size + N_batch = Ks.shape[0] + batch_id = paddle.arange(N_batch).unsqueeze(1) + # obj_id = batch_id.repeat(1, N // N_batch).flatten() + obj_id = batch_id.tile([1, N // N_batch]).flatten() + + # trans_mats_inv = trans_mats.inverse()[obj_id] + # Ks_inv = Ks.inverse()[obj_id] + inv = trans_mats.inverse() + trans_mats_inv = paddle.concat([inv[int(obj_id[i])].unsqueeze(0) for i in range(len(obj_id))]) + + inv = Ks.inverse() + Ks_inv = paddle.concat([inv[int(obj_id[i])].unsqueeze(0) for i in range(len(obj_id))]) + + points = paddle.reshape(points, (-1, 2)) + assert points.shape[0] == N + + # int + float -> int, but float + int -> float + # proj_points = points + points_offset + proj_points = points_offset + points + + # transform project points in homogeneous form. + proj_points_extend = paddle.concat( + (proj_points.astype("float32"), paddle.ones((N, 1))), axis=1) + # expand project points as [N, 3, 1] + proj_points_extend = proj_points_extend.unsqueeze(-1) + # transform project points back on image + proj_points_img = paddle.matmul(trans_mats_inv, proj_points_extend) + # with depth + proj_points_img = proj_points_img * paddle.reshape(depths, (N, -1, 1)) + # transform image coordinates back to object locations + locations = paddle.matmul(Ks_inv, proj_points_img) + + return locations.squeeze(2) + + def decode_location_without_transmat(self, + points, points_offset, + depths, Ks, down_ratios=None): + """ + retrieve objects location in camera coordinate based on projected points + Args: + points: projected points on feature map in (x, y) + points_offset: project points offset in (delata_x, delta_y) + depths: object depth z + Ks: camera intrinsic matrix, shape = [N, 3, 3] + trans_mats: transformation matrix from image to feature map, shape = [N, 3, 3] + + Returns: + locations: objects location, shape = [N, 3] + """ + + if down_ratios is None: + down_ratios = [(1, 1)] + + + # number of points + N = points_offset.shape[0] + # batch size + N_batch = Ks.shape[0] + #batch_id = paddle.arange(N_batch).unsqueeze(1) + batch_id = paddle.arange(N_batch).reshape((N_batch, 1)) + + + # obj_id = batch_id.repeat(1, N // N_batch).flatten() + obj_id = batch_id.tile([1, N // N_batch]).flatten() + + # Ks_inv = Ks[obj_id] pytorch + + # Ks_inv = paddle.concat([Ks[int(obj_id[i])].unsqueeze(0) for i in range(len(obj_id))]) + length = int(obj_id.shape[0]) + ks_v = [] + for i in range(length): + ks_v.append(Ks[int(obj_id[i])].unsqueeze(0)) + Ks_inv = paddle.concat(ks_v) + + down_ratio = down_ratios[0] + points = paddle.reshape(points, (numel_t(points)//2, 2)) + proj_points = points + points_offset + + # trans point from heatmap to ori image, down_sample * resize_scale + proj_points[:, 0] = down_ratio[0] * proj_points[:, 0] + proj_points[:, 1] = down_ratio[1] * proj_points[:, 1] + # transform project points in homogeneous form. + + + proj_points_extend = paddle.concat( + [proj_points, paddle.ones((N, 1))], axis=1) + # expand project points as [N, 3, 1] + proj_points_extend = proj_points_extend.unsqueeze(-1) + # with depth + proj_points_img = proj_points_extend * paddle.reshape(depths, (N, numel_t(depths)//N, 1)) + # transform image coordinates back to object locations + locations = paddle.matmul(Ks_inv, proj_points_img) + + return locations.squeeze(2) + + def decode_bbox_2d_without_transmat(self, points, bbox_size, down_ratios=None): + """get bbox 2d + + Args: + points (paddle.Tensor, (50, 2)): 2d center + bbox_size (paddle.Tensor, (50, 2)): 2d bbox height and width + trans_mats (paddle.Tensor, (1, 3, 3)): transformation coord from img to feature map + """ + + if down_ratios is None: + down_ratios = [(1, 1)] + # number of points + N = bbox_size.shape[0] + points = paddle.reshape(points, (-1, 2)) + assert points.shape[0] == N + + box2d = paddle.zeros((N, 4)) + down_ratio = down_ratios[0] + box2d[:, 0] = (points[:, 0] - bbox_size[:, 0] / 2) + box2d[:, 1] = (points[:, 1] - bbox_size[:, 1] / 2) + box2d[:, 2] = (points[:, 0] + bbox_size[:, 0] / 2) + box2d[:, 3] = (points[:, 1] + bbox_size[:, 1] / 2) + + box2d[:, 0] = down_ratio[0] * box2d[:, 0] + box2d[:, 1] = down_ratio[1] * box2d[:, 1] + box2d[:, 2] = down_ratio[0] * box2d[:, 2] + box2d[:, 3] = down_ratio[1] * box2d[:, 3] + + return box2d + + def decode_dimension(self, cls_id, dims_offset): + """ + retrieve object dimensions + Args: + cls_id: each object id + dims_offset: dimension offsets, shape = (N, 3) + + Returns: + + """ + # cls_id = cls_id.flatten().long() + # dims_select = self.dim_ref[cls_id, :] + # cls_id = cls_id.flatten() + # dims_select = paddle.concat([self.dim_ref[int(cls_id[i])].unsqueeze(0) for i in range(len(cls_id))]) + # dimensions = dims_offset.exp() * dims_select + + # return dimensions + return self.dimension_decoder(cls_id, dims_offset) + + def decode_orientation(self, vector_ori, locations, flip_mask=None): + """ + retrieve object orientation + Args: + vector_ori: local orientation in [sin, cos] format + locations: object location + + Returns: for training we only need roty + for testing we need both alpha and roty + + """ + + locations = paddle.reshape(locations, (-1, 3)) + rays = paddle.atan(locations[:, 0] / (locations[:, 2] + 1e-7)) + alphas = paddle.atan(vector_ori[:, 0] / (vector_ori[:, 1] + 1e-7)) + + # get cosine value positive and negtive index. + cos_pos_idx = (vector_ori[:, 1] >= 0).nonzero() + cos_neg_idx = (vector_ori[:, 1] < 0).nonzero() + + PI = 3.14159 + for i in range(cos_pos_idx.shape[0]): + ind = int(cos_pos_idx[i,0]) + alphas[ind] = alphas[ind] - PI / 2 + for i in range(cos_neg_idx.shape[0]): + ind = int(cos_neg_idx[i,0]) + alphas[ind] = alphas[ind] + PI / 2 + + # alphas[cos_pos_idx] -= PI / 2 + # alphas[cos_neg_idx] += PI / 2 + + # retrieve object rotation y angle. + rotys = alphas + rays + + # in training time, it does not matter if angle lies in [-PI, PI] + # it matters at inference time? todo: does it really matter if it exceeds. + larger_idx = (rotys > PI).nonzero() + small_idx = (rotys < -PI).nonzero() + + if len(larger_idx) != 0: + for i in range(larger_idx.shape[0]): + ind = int(larger_idx[i,0]) + rotys[ind] -= 2 * PI + if len(small_idx) != 0: + for i in range(small_idx.shape[0]): + ind = int(small_idx[i,0]) + rotys[ind] += 2 * PI + + if flip_mask is not None: + + fm = flip_mask.astype("float32").flatten() + rotys_flip = fm * rotys + + # rotys_flip_pos_idx = rotys_flip > 0 + # rotys_flip_neg_idx = rotys_flip < 0 + # rotys_flip[rotys_flip_pos_idx] -= PI + # rotys_flip[rotys_flip_neg_idx] += PI + + rotys_flip_pos_idx = (rotys_flip > 0).nonzero() + rotys_flip_neg_idx = (rotys_flip < 0).nonzero() + + for i in range(rotys_flip_pos_idx.shape[0]): + ind = int(rotys_flip_pos_idx[i, 0]) + rotys_flip[ind] -= PI + for i in range(rotys_flip_neg_idx.shape[0]): + ind = int(rotys_flip_neg_idx[i, 0]) + rotys_flip[ind] += PI + + + rotys_all = fm * rotys_flip + (1 - fm) * rotys + + return rotys_all + + else: + return rotys, alphas + + def decode_bbox_2d(self, points, bbox_size, trans_mats, img_size): + """get bbox 2d + + Args: + points (paddle.Tensor, (50, 2)): 2d center + bbox_size (paddle.Tensor, (50, 2)): 2d bbox height and width + trans_mats (paddle.Tensor, (1, 3, 3)): transformation coord from img to feature map + """ + + + img_size = img_size.flatten() + + # number of points + N = bbox_size.shape[0] + # batch size + N_batch = trans_mats.shape[0] + batch_id = paddle.arange(N_batch).unsqueeze(1) + # obj_id = batch_id.repeat(1, N // N_batch).flatten() + obj_id = batch_id.tile([1, N // N_batch]).flatten() + + inv = trans_mats.inverse() + trans_mats_inv = paddle.concat([inv[int(obj_id[i])].unsqueeze(0) for i in range(len(obj_id))]) + + #trans_mats_inv = trans_mats.inverse()[obj_id] + points = paddle.reshape(points, (-1, 2)) + assert points.shape[0] == N + + box2d = paddle.zeros([N, 4]) + box2d[:, 0] = (points[:, 0] - bbox_size[:, 0] / 2) + box2d[:, 1] = (points[:, 1] - bbox_size[:, 1] / 2) + box2d[:, 2] = (points[:, 0] + bbox_size[:, 0] / 2) + box2d[:, 3] = (points[:, 1] + bbox_size[:, 1] / 2) + # transform project points in homogeneous form. + proj_points_extend_top = paddle.concat( + (box2d[:, :2], paddle.ones([N, 1])), axis=1) + proj_points_extend_bot = paddle.concat( + (box2d[:, 2:], paddle.ones([N, 1])), axis=1) + + # expand project points as [N, 3, 1] + proj_points_extend_top = proj_points_extend_top.unsqueeze(-1) + proj_points_extend_bot = proj_points_extend_bot.unsqueeze(-1) + + # transform project points back on image + proj_points_img_top = paddle.matmul(trans_mats_inv, proj_points_extend_top) + proj_points_img_bot = paddle.matmul(trans_mats_inv, proj_points_extend_bot) + box2d[:, :2] = proj_points_img_top.squeeze(2)[:, :2] + box2d[:, 2:] = proj_points_img_bot.squeeze(2)[:, :2] + + box2d[:, ::2] = box2d[:, ::2].clip(0, img_size[0]) + box2d[:, 1::2] = box2d[:, 1::2].clip(0, img_size[1]) + return box2d + +class DepthDecoder(paddle.nn.Layer): + def __init__(self, depth_ref): + super().__init__() + self.depth_ref = paddle.to_tensor(depth_ref) + def forward(self, depths_offset): + """ + Transform depth offset to depth + """ + depth = depths_offset * self.depth_ref[1] + self.depth_ref[0] + + return depth + +class DimensionDecoder(paddle.nn.Layer): + def __init__(self, dim_ref): + super().__init__() + self.dim_ref = paddle.to_tensor(dim_ref) + + def forward(self, cls_id, dims_offset): + """ + retrieve object dimensions + Args: + cls_id: each object id + dims_offset: dimension offsets, shape = (N, 3) + + Returns: + + """ + # cls_id = cls_id.flatten().long() + # dims_select = self.dim_ref[cls_id, :] + cls_id = cls_id.flatten() + + #dims_select = paddle.concat([self.dim_ref[int(cls_id[i])].unsqueeze(0) for i in range(len(cls_id))]) + length = int(cls_id.shape[0]) + list_v = [] + for i in range(length): + list_v.append(self.dim_ref[int(cls_id[i])].unsqueeze(0)) + dims_select = paddle.concat(list_v) + + dimensions = dims_offset.exp() * dims_select + + return dimensions + +def numel_t(var): + from numpy import prod + assert -1 not in var.shape + return prod(var.shape) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/heads/smoke_predictor.py b/PaddleCV/3d_vision/SMOKE/smoke/models/heads/smoke_predictor.py new file mode 100644 index 0000000000000000000000000000000000000000..e78ac73fe7af3e9138885884bfe86e27427d5de1 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/heads/smoke_predictor.py @@ -0,0 +1,151 @@ + +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from smoke.models.layers import group_norm, sigmoid_hm +from smoke.cvlibs import manager, param_init + +@manager.HEADS.add_component +class SMOKEPredictor(nn.Layer): + """SMOKE Predictor + """ + def __init__(self, + num_classes=3, + reg_heads=10, + reg_channels=(1, 2, 3, 2, 2), + num_chanels=256, + norm_type="gn", + in_channels=64): + super().__init__() + + regression = reg_heads + regression_channels = reg_channels + head_conv = num_chanels + norm_func = nn.BatchNorm2D if norm_type == "bn" else group_norm + + assert sum(regression_channels) == regression, \ + "the sum of {} must be equal to regression channel of {}".format( + reg_channels, reg_heads + ) + + self.dim_channel = get_channel_spec(regression_channels, name="dim") + self.ori_channel = get_channel_spec(regression_channels, name="ori") + + + self.class_head = nn.Sequential( + nn.Conv2D(in_channels, + head_conv, + kernel_size=3, + padding=1, + bias_attr=True), + + norm_func(head_conv), + + nn.ReLU(), + + nn.Conv2D(head_conv, + num_classes, + kernel_size=1, + padding=1 // 2, + bias_attr=True) + ) + + # todo: what is datafill here + #self.class_head[-1].bias.data.fill_(-2.19) + param_init.constant_init(self.class_head[-1].bias, value=-2.19) + + self.regression_head = nn.Sequential( + nn.Conv2D(in_channels, + head_conv, + kernel_size=3, + padding=1, + bias_attr=True), + + norm_func(head_conv), + + nn.ReLU(), + + nn.Conv2D(head_conv, + regression, + kernel_size=1, + padding=1 // 2, + bias_attr=True) + ) + + #_fill_fc_weights(self.regression_head) + self.init_weight(self.regression_head) + + def forward(self, features): + """predictor forward + + Args: + features (paddle.Tensor): smoke backbone output + + Returns: + list: sigmoid class heatmap and regression map + """ + head_class = self.class_head(features) + head_regression = self.regression_head(features) + head_class = sigmoid_hm(head_class) + + # (N, C, H, W) + + # left slice bug + # offset_dims = head_regression[:, self.dim_channel, :, :].clone() + # head_regression[:, self.dim_channel, :, :] = F.sigmoid(offset_dims) - 0.5 + # vector_ori = head_regression[:, self.ori_channel, :, :].clone() + # head_regression[:, self.ori_channel, :, :] = F.normalize(vector_ori) + + offset_dims = head_regression[:, self.dim_channel, :, :].clone() + head_reg_dim = F.sigmoid(offset_dims) - 0.5 + + vector_ori = head_regression[:, self.ori_channel, :, :].clone() + head_reg_ori = F.normalize(vector_ori) + + head_regression_left = head_regression[:, :self.dim_channel.start, :, :] + head_regression_right = head_regression[:, self.ori_channel.stop:, :, :] + head_regression = paddle.concat([head_regression_left, head_reg_dim, head_reg_ori, head_regression_right], axis=1) + + + return [head_class, head_regression] + + def init_weight(self, block): + for sublayer in block.sublayers(): + if isinstance(sublayer, nn.Conv2D): + param_init.constant_init(sublayer.bias, value=0.0) + + +def get_channel_spec(reg_channels, name): + """get dim and ori dim + + Args: + reg_channels (tuple): regress channels, default(1, 2, 3, 2) for + (depth_offset, keypoint_offset, dims, ori) + name (str): dim or ori + + Returns: + slice: for start channel to stop channel + """ + if name == "dim": + s = sum(reg_channels[:2]) + e = sum(reg_channels[:3]) + elif name == "ori": + s = sum(reg_channels[:3]) + e = sum(reg_channels[:4]) + + return slice(s, e, 1) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/layers/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/models/layers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a1111b028fc3c18129a76abff035a9a1ec160943 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/layers/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .gn import group_norm +from .layer_libs import sigmoid_hm, nms_hm, select_topk, select_point_of_interest \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/layers/gn.py b/PaddleCV/3d_vision/SMOKE/smoke/models/layers/gn.py new file mode 100644 index 0000000000000000000000000000000000000000..e968a40f98c0dd0eba0748c599729f6ce8c95a87 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/layers/gn.py @@ -0,0 +1,32 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +def group_norm(out_channels): + """group normal function + + Args: + out_channels (int): out channel nums + + Returns: + nn.Module: GroupNorm op + """ + num_groups = 32 + if out_channels % 32 == 0: + return nn.GroupNorm(num_groups, out_channels) + else: + return nn.GroupNorm(num_groups // 2, out_channels) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/layers/layer_libs.py b/PaddleCV/3d_vision/SMOKE/smoke/models/layers/layer_libs.py new file mode 100644 index 0000000000000000000000000000000000000000..33f8696d488156b7c469f6f031bd90ba30cda45c --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/layers/layer_libs.py @@ -0,0 +1,149 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from smoke.ops import gather_op + + +def sigmoid_hm(hm_features): + """sigmoid to headmap + + Args: + hm_features (paddle.Tensor): heatmap + + Returns: + paddle.Tensor: sigmoid heatmap + """ + x = F.sigmoid(hm_features) + x = x.clip(min=1e-4, max=1 - 1e-4) + + return x + +def nms_hm(heat_map, kernel=3): + """Do max_pooling for nms + + Args: + heat_map (paddle.Tensor): pred cls heatmap + kernel (int, optional): max_pool kernel size. Defaults to 3. + + Returns: + heatmap after nms + """ + pad = (kernel - 1) // 2 + + hmax = F.max_pool2d(heat_map, + kernel_size=(kernel, kernel), + stride=1, + padding=pad) + eq_index = (hmax == heat_map).astype("float32") + + return heat_map * eq_index + + +def select_topk(heat_map, K=100): + """ + Args: + heat_map: heat_map in [N, C, H, W] + K: top k samples to be selected + score: detection threshold + + Returns: + + """ + + #batch, c, height, width = paddle.shape(heat_map) + + batch, c = heat_map.shape[:2] + height = paddle.shape(heat_map)[2] + width = paddle.shape(heat_map)[3] + + # First select topk scores in all classes and batchs + # [N, C, H, W] -----> [N, C, H*W] + heat_map = paddle.reshape(heat_map, (batch, c, -1)) + # Both in [N, C, K] + topk_scores_all, topk_inds_all = paddle.topk(heat_map, K) + + + # topk_inds_all = topk_inds_all % (height * width) # todo: this seems redudant + topk_ys = (topk_inds_all // width).astype("float32") + topk_xs = (topk_inds_all % width).astype("float32") + + + # Select topK examples across channel + # [N, C, K] -----> [N, C*K] + topk_scores_all = paddle.reshape(topk_scores_all, (batch, -1)) + # Both in [N, K] + topk_scores, topk_inds = paddle.topk(topk_scores_all, K) + topk_clses = (topk_inds // K).astype("float32") + + # First expand it as 3 dimension + topk_inds_all = paddle.reshape(_gather_feat(paddle.reshape(topk_inds_all, (batch, -1, 1)), topk_inds), (batch, K)) + topk_ys = paddle.reshape(_gather_feat(paddle.reshape(topk_ys, (batch, -1, 1)), topk_inds), (batch, K)) + topk_xs = paddle.reshape(_gather_feat(paddle.reshape(topk_xs, (batch, -1, 1)), topk_inds), (batch, K)) + + return dict({"topk_score": topk_scores, "topk_inds_all": topk_inds_all, + "topk_clses": topk_clses, "topk_ys": topk_ys, "topk_xs": topk_xs}) + + +def _gather_feat(feat, ind, mask=None): + """ + Select specific indexs on featuremap + Args: + feat: all results in 3 dimensions + ind: positive index + + Returns: + + """ + channel = feat.shape[-1] + ind = ind.unsqueeze(-1).expand((ind.shape[0], ind.shape[1], channel)) + + feat = gather_op(feat, 1, ind) + + if mask is not None: + mask = mask.unsqueeze(2).expand_as(feat) + feat = feat[mask] + feat = feat.view(-1, channel) + return feat + +def select_point_of_interest(batch, index, feature_maps): + """ + Select POI(point of interest) on feature map + Args: + batch: batch size + index: in point format or index format + feature_maps: regression feature map in [N, C, H, W] + + Returns: + + """ + w = feature_maps.shape[3] + index_length = len(index.shape) + if index_length == 3: + index = index[:, :, 1] * w + index[:, :, 0] + index = paddle.reshape(index, (batch, -1)) + # [N, C, H, W] -----> [N, H, W, C] + feature_maps = paddle.transpose(feature_maps, (0, 2, 3, 1)) + channel = feature_maps.shape[-1] + # [N, H, W, C] -----> [N, H*W, C] + feature_maps = paddle.reshape(feature_maps, (batch, -1, channel)) + # expand index in channels + index = index.unsqueeze(-1).tile((1, 1, channel)) + # select specific features bases on POIs + feature_maps = gather_op(feature_maps, 1, index) + + return feature_maps \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/losses/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/models/losses/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..6553445dc01603564861b4505359083b358318fc --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/losses/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .focal_loss import FocalLoss +from .loss import SMOKELossComputation \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/losses/focal_loss.py b/PaddleCV/3d_vision/SMOKE/smoke/models/losses/focal_loss.py new file mode 100644 index 0000000000000000000000000000000000000000..133b251749fdfb433fc6c8cdba2f95d507709434 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/losses/focal_loss.py @@ -0,0 +1,57 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +from paddle import nn + + +class FocalLoss(nn.Layer): + """Focal loss class + """ + def __init__(self, alpha=2, beta=4): + super().__init__() + self.alpha = alpha + self.beta = beta + + def forward(self, prediction, target): + """forward + + Args: + prediction (paddle.Tensor): model prediction + target (paddle.Tensor): ground truth + + Returns: + paddle.Tensor: focal loss + """ + positive_index = (target == 1).astype("float32") + negative_index = (target < 1).astype("float32") + + negative_weights = paddle.pow(1 - target, self.beta) + loss = 0. + + positive_loss = paddle.log(prediction) \ + * paddle.pow(1 - prediction, self.alpha) * positive_index + negative_loss = paddle.log(1 - prediction) \ + * paddle.pow(prediction, self.alpha) * negative_weights * negative_index + + num_positive = positive_index.sum() + positive_loss = positive_loss.sum() + negative_loss = negative_loss.sum() + + if num_positive == 0: + loss -= negative_loss + else: + loss -= (positive_loss + negative_loss) / num_positive + + return loss diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/losses/loss.py b/PaddleCV/3d_vision/SMOKE/smoke/models/losses/loss.py new file mode 100644 index 0000000000000000000000000000000000000000..7055f145abc00db23175961639fe697a69080acc --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/losses/loss.py @@ -0,0 +1,204 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import copy + +import numpy as np +import cv2 +import paddle +import paddle.nn as nn +from paddle.nn import functional as F + +from smoke.models.losses import FocalLoss +from smoke.models.layers import select_point_of_interest +from smoke.cvlibs import manager +from smoke.models.heads import SMOKECoder + +@manager.LOSSES.add_component +class SMOKELossComputation(object): + """Convert targets and preds to heatmaps®s, compute + loss with CE and L1 + """ + def __init__(self, + depth_ref, + dim_ref, + reg_loss="DisL1", + loss_weight=(1., 10.), + max_objs=50): + + self.smoke_coder = SMOKECoder(depth_ref, dim_ref) + self.cls_loss = FocalLoss(alpha=2, beta=4) + self.reg_loss = reg_loss + self.loss_weight = loss_weight + self.max_objs = max_objs + + def prepare_targets(self, targets): + """get heatmaps, regressions and 3D infos from targets + """ + + heatmaps = targets["hm"] + regression = targets["reg"] + cls_ids = targets["cls_ids"] + proj_points = targets["proj_p"] + dimensions = targets["dimensions"] + locations = targets["locations"] + rotys = targets["rotys"] + trans_mat = targets["trans_mat"] + K = targets["K"] + reg_mask = targets["reg_mask"] + flip_mask = targets["flip_mask"] + bbox_size = targets["bbox_size"] + c_offsets = targets["c_offsets"] + + return heatmaps, regression, dict(cls_ids=cls_ids, + proj_points=proj_points, + dimensions=dimensions, + locations=locations, + rotys=rotys, + trans_mat=trans_mat, + K=K, + reg_mask=reg_mask, + flip_mask=flip_mask, + bbox_size=bbox_size, + c_offsets=c_offsets) + + def prepare_predictions(self, targets_variables, pred_regression): + """decode model predictions + """ + batch, channel = pred_regression.shape[0], pred_regression.shape[1] + targets_proj_points = targets_variables["proj_points"] + + # obtain prediction from points of interests + pred_regression_pois = select_point_of_interest( + batch, targets_proj_points, pred_regression + ) + pred_regression_pois = paddle.reshape(pred_regression_pois, (-1, channel)) + + # FIXME: fix hard code here + pred_depths_offset = pred_regression_pois[:, 0] + pred_proj_offsets = pred_regression_pois[:, 1:3] + pred_dimensions_offsets = pred_regression_pois[:, 3:6] + pred_orientation = pred_regression_pois[:, 6:8] + # pred_bboxsize = paddle.zeros_like(pred_regression_pois[:, 6:8]) + pred_bboxsize = pred_regression_pois[:, 8:10] + # pred_c_offsets = pred_regression_pois[:, 10:12] + + + pred_depths = self.smoke_coder.decode_depth(pred_depths_offset) + pred_locations = self.smoke_coder.decode_location( + targets_proj_points, + pred_proj_offsets, + pred_depths, + targets_variables["K"], + targets_variables["trans_mat"] + ) + + pred_dimensions = self.smoke_coder.decode_dimension( + targets_variables["cls_ids"], + pred_dimensions_offsets, + ) + # we need to change center location to bottom location + # bug on left slice + # pred_locations[:, 1] += pred_dimensions[:, 1] / 2 + + pred_locations_x = (pred_locations[:, 0]).unsqueeze(-1) + pred_locations_y = (pred_locations[:, 1] + pred_dimensions[:, 1] / 2).unsqueeze(-1) + pred_locations_z = (pred_locations[:, 2]).unsqueeze(-1) + pred_locations = paddle.concat([pred_locations_x, pred_locations_y, pred_locations_z], axis=1) + + pred_rotys = self.smoke_coder.decode_orientation( + pred_orientation, + targets_variables["locations"], + targets_variables["flip_mask"] + ) + + if self.reg_loss == "DisL1": + pred_box3d_rotys = self.smoke_coder.encode_box3d( + pred_rotys, + targets_variables["dimensions"], + targets_variables["locations"] + ) + + pred_box3d_dims = self.smoke_coder.encode_box3d( + targets_variables["rotys"], + pred_dimensions, + targets_variables["locations"] + ) + pred_box3d_locs = self.smoke_coder.encode_box3d( + targets_variables["rotys"], + targets_variables["dimensions"], + pred_locations + ) + + + return dict(ori=pred_box3d_rotys, + dim=pred_box3d_dims, + loc=pred_box3d_locs, + bbox=pred_bboxsize,) + # coff=pred_c_offsets) + + elif self.reg_loss == "L1": + pred_box_3d = self.smoke_coder.encode_box3d( + pred_rotys, + pred_dimensions, + pred_locations + ) + return pred_box_3d + + def __call__(self, predictions, targets): + pred_heatmap, pred_regression = predictions[0], predictions[1] + + targets_heatmap, targets_regression, targets_variables \ + = self.prepare_targets(targets) + + predict_boxes3d = self.prepare_predictions(targets_variables, pred_regression) + + hm_loss = self.cls_loss(pred_heatmap, targets_heatmap) * self.loss_weight[0] + + + targets_regression = paddle.reshape(targets_regression, ( + -1, targets_regression.shape[2], targets_regression.shape[3] + )) + + reg_mask = targets_variables["reg_mask"].astype("float32").flatten() + reg_mask = paddle.reshape(reg_mask, (-1, 1, 1)) + reg_mask = reg_mask.expand_as(targets_regression) + + if self.reg_loss == "DisL1": + reg_loss_ori = F.l1_loss( + predict_boxes3d["ori"] * reg_mask, + targets_regression * reg_mask, + reduction="sum") / (self.loss_weight[1] * self.max_objs) + + reg_loss_dim = F.l1_loss( + predict_boxes3d["dim"] * reg_mask, + targets_regression * reg_mask, + reduction="sum") / (self.loss_weight[1] * self.max_objs) + + reg_loss_loc = F.l1_loss( + predict_boxes3d["loc"] * reg_mask, + targets_regression * reg_mask, + reduction="sum") / (self.loss_weight[1] * self.max_objs) + + reg_loss_size = F.l1_loss( + predict_boxes3d["bbox"], + paddle.reshape(targets_variables["bbox_size"],(-1, targets_variables["bbox_size"].shape[-1])), + reduction="sum") / (self.loss_weight[1] * self.max_objs) + + losses = dict(hm_loss=hm_loss, + reg_loss=reg_loss_ori + reg_loss_dim + reg_loss_loc, + size_loss=reg_loss_size) + + return losses \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7d0052313d1994eb94b67f6cad8ea090db162202 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .processor import PostProcessor +from .processorhm import PostProcessorHm \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/processor.py b/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/processor.py new file mode 100644 index 0000000000000000000000000000000000000000..94be5721f5d7f1171f46f649d25e98fc451fbd13 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/processor.py @@ -0,0 +1,118 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +from paddle import nn + +from smoke.models.layers import nms_hm, select_topk, select_point_of_interest +from smoke.cvlibs import manager +from smoke.models.heads import SMOKECoder + +@manager.POSTPROCESSORS.add_component +class PostProcessor(nn.Layer): + def __init__(self, + depth_ref, + dim_ref, + reg_head=10, + det_threshold=0.25, + max_detection=50, + pred_2d=True): + super().__init__() + + self.smoke_coder = SMOKECoder(depth_ref, dim_ref) + self.reg_head = reg_head + self.max_detection = max_detection + self.det_threshold = det_threshold + self.pred_2d = pred_2d + + def forward(self, predictions, targets): + + pred_heatmap, pred_regression = predictions[0], predictions[1] + batch = pred_heatmap.shape[0] + + heatmap = nms_hm(pred_heatmap) + + topk_dict = select_topk( + heatmap, + K=self.max_detection, + ) + scores, indexs = topk_dict["topk_score"], topk_dict["topk_inds_all"] + clses, ys = topk_dict["topk_clses"], topk_dict["topk_ys"] + xs = topk_dict["topk_xs"] + + pred_regression = select_point_of_interest( + batch, indexs, pred_regression + ) + + pred_regression_pois = paddle.reshape(pred_regression, (-1, self.reg_head)) + + pred_proj_points = paddle.concat([paddle.reshape(xs, (-1, 1)), paddle.reshape(ys, (-1, 1))], axis=1) + + # FIXME: fix hard code here + pred_depths_offset = pred_regression_pois[:, 0] + pred_proj_offsets = pred_regression_pois[:, 1:3] + pred_dimensions_offsets = pred_regression_pois[:, 3:6] + pred_orientation = pred_regression_pois[:, 6:8] + pred_bbox_size = pred_regression_pois[:, 8:10] + + pred_depths = self.smoke_coder.decode_depth(pred_depths_offset) + pred_locations = self.smoke_coder.decode_location( + pred_proj_points, + pred_proj_offsets, + pred_depths, + targets["K"], + targets["trans_mat"]) + pred_dimensions = self.smoke_coder.decode_dimension( + clses, + pred_dimensions_offsets + ) + # we need to change center location to bottom location + pred_locations[:, 1] += pred_dimensions[:, 1] / 2 + + pred_rotys, pred_alphas = self.smoke_coder.decode_orientation( + pred_orientation, + pred_locations + ) + + if self.pred_2d: + box2d = self.smoke_coder.decode_bbox_2d(pred_proj_points, pred_bbox_size, + targets["trans_mat"], + targets["image_size"]) + else: + box2d = paddle.to_tensor([0, 0, 0, 0]) + + # change variables to the same dimension + clses = paddle.reshape(clses, (-1, 1)) + pred_alphas = paddle.reshape(pred_alphas, (-1, 1)) + pred_rotys = paddle.reshape(pred_rotys, (-1, 1)) + scores = paddle.reshape(scores, (-1, 1)) + + l, h, w = pred_dimensions.chunk(3, 1) + pred_dimensions = paddle.concat([h, w, l], axis=1) + + + result = paddle.concat([ + clses, pred_alphas, box2d, pred_dimensions, pred_locations, pred_rotys, scores + ], axis=1) + + keep_idx = result[:, -1] > self.det_threshold + + if paddle.sum(keep_idx.astype("int32")) >= 1: + keep_idx = paddle.nonzero(result[:, -1] > self.det_threshold) + result = paddle.gather(result, keep_idx) + else: + result = paddle.to_tensor([]) + + + return result \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/processorhm.py b/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/processorhm.py new file mode 100644 index 0000000000000000000000000000000000000000..c499dd7a699e4cd20ae493d29d6a50ceb69790f4 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/postprocess/processorhm.py @@ -0,0 +1,107 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +from paddle import nn + +from smoke.models.heads import SMOKECoder +from smoke.models.layers import nms_hm, select_topk, select_point_of_interest +from smoke.cvlibs import manager + +@manager.POSTPROCESSORS.add_component +class PostProcessorHm(nn.Layer): + def __init__(self, + depth_ref, + dim_ref, + reg_head=10, + det_threshold=0.25, + max_detection=50, + pred_2d=True): + super().__init__() + + self.smoke_coder = SMOKECoder(depth_ref, dim_ref) + self.max_detection = max_detection + + def forward(self, predictions, cam_info): + + pred_heatmap, pred_regression = predictions[0], predictions[1] + batch = pred_heatmap.shape[0] + + heatmap = nms_hm(pred_heatmap) + + topk_dict = select_topk( + heatmap, + K=self.max_detection, + ) + scores, indexs = topk_dict["topk_score"], topk_dict["topk_inds_all"] + clses, ys = topk_dict["topk_clses"], topk_dict["topk_ys"] + xs = topk_dict["topk_xs"] + + pred_regression = select_point_of_interest( + batch, indexs, pred_regression + ) + + # pred_regression_pois = paddle.reshape(pred_regression, (pred_regression.numel()//10, 10)) + # pred_proj_points = paddle.concat([paddle.reshape(xs, (xs.numel(), 1)), paddle.reshape(ys, (ys.numel(), 1))], axis=1) + + pred_regression_pois = paddle.reshape(pred_regression, (numel_t(pred_regression)//10, 10)) + pred_proj_points = paddle.concat([paddle.reshape(xs, (numel_t(xs), 1)), paddle.reshape(ys, (numel_t(ys), 1))], axis=1) + + # FIXME: fix hard code here + pred_depths_offset = pred_regression_pois[:, 0] + pred_proj_offsets = pred_regression_pois[:, 1:3] + pred_dimensions_offsets = pred_regression_pois[:, 3:6] + pred_orientation = pred_regression_pois[:, 6:8] + pred_bbox_size = pred_regression_pois[:, 8:10] + + pred_depths = self.smoke_coder.decode_depth(pred_depths_offset) + pred_locations = self.smoke_coder.decode_location_without_transmat( + pred_proj_points, + pred_proj_offsets, + pred_depths, + cam_info[0], cam_info[1]) + pred_dimensions = self.smoke_coder.decode_dimension( + clses, + pred_dimensions_offsets + ) + # we need to change center location to bottom location + pred_locations[:, 1] += pred_dimensions[:, 1] / 2 + + pred_rotys, pred_alphas = self.smoke_coder.decode_orientation( + pred_orientation, + pred_locations + ) + box2d = self.smoke_coder.decode_bbox_2d_without_transmat(pred_proj_points, + pred_bbox_size, cam_info[1]) + # change variables to the same dimension + clses = paddle.reshape(clses, (-1, 1)) + pred_alphas = paddle.reshape(pred_alphas, (-1, 1)) + pred_rotys = paddle.reshape(pred_rotys, (-1, 1)) + scores = paddle.reshape(scores, (-1, 1)) + + l, h, w = pred_dimensions.chunk(3, 1) + pred_dimensions = paddle.concat([h, w, l], axis=1) + + + result = paddle.concat([ + clses, pred_alphas, box2d, pred_dimensions, pred_locations, pred_rotys, scores + ], axis=1) + + + return result + +def numel_t(var): + from numpy import prod + assert -1 not in var.shape + return prod(var.shape) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/models/smoke.py b/PaddleCV/3d_vision/SMOKE/smoke/models/smoke.py new file mode 100644 index 0000000000000000000000000000000000000000..db5fad7cdfe55912bd770160667dfa534e3d45d2 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/models/smoke.py @@ -0,0 +1,42 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from smoke.cvlibs import manager +from smoke.utils import logger + +@manager.MODELS.add_component +class SMOKE(nn.Layer): + def __init__(self, backbone, head, post_process=None): + super().__init__() + self.backbone = backbone + self.heads = head + self.post_process = post_process + self.init_weight() + + def forward(self, images, targets=None): + features = self.backbone(images) + predictions = self.heads(features) + if not self.training: + return self.post_process(predictions, targets) + + return predictions + + def init_weight(self, bias_lr_factor=2): + for sublayer in self.sublayers(): + if hasattr(sublayer, 'bias') and sublayer.bias is not None: + sublayer.bias.optimize_attr['learning_rate'] = bias_lr_factor \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/ops/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/ops/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..994f1ae8ba0f928a833d75469ef736f8f04fca80 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/ops/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .gather import gather_op \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/ops/gather.py b/PaddleCV/3d_vision/SMOKE/smoke/ops/gather.py new file mode 100644 index 0000000000000000000000000000000000000000..e05804ebb9cf90af428f94cad3d6b40c2747498c --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/ops/gather.py @@ -0,0 +1,60 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +The same function as torch.gather. +Note that: In PaddlePaddle2.0, paddle.gather is different with torch.gather +""" + +import paddle + +def gather_op(x, dim, index): + + dtype_mapping = {"VarType.INT32": "int32", "VarType.INT64": "int64", "paddle.int32": "int32", "paddle.int64": "int64"} + if dim < 0: + dim += len(x.shape) + + x_range = list(range(len(x.shape))) + x_range[0] = dim + x_range[dim] = 0 + x_swaped = paddle.transpose(x, perm=x_range) + + index_range = list(range(len(index.shape))) + index_range[0] = dim + index_range[dim] = 0 + index_swaped = paddle.transpose(index, perm=index_range) + + dtype = dtype_mapping[str(index.dtype)] + x_shape = paddle.shape(x_swaped) + index_shape = paddle.shape(index_swaped) + prod = paddle.prod(x_shape, dtype=dtype) / x_shape[0] + + x_swaped_flattend = paddle.flatten(x_swaped) + index_swaped_flattend = paddle.flatten(index_swaped) + index_swaped_flattend *= prod + + bias = paddle.arange(start=0, end=prod, dtype=dtype) + bias = paddle.reshape(bias, x_shape[1:]) + bias = paddle.crop(bias, index_shape[1:]) + bias = paddle.flatten(bias) + bias = paddle.tile(bias, [index_shape[0]]) + + index_swaped_flattend += bias + + gathered = paddle.index_select(x_swaped_flattend, index_swaped_flattend) + gathered = paddle.reshape(gathered, index_swaped.shape) + + out = paddle.transpose(gathered, perm=x_range) + + return out \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/transforms/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/transforms/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..39b580c10ae2bcc89d7fff7513960a47e4c18a1c --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/transforms/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .transforms import * \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/transforms/functional.py b/PaddleCV/3d_vision/SMOKE/smoke/transforms/functional.py new file mode 100644 index 0000000000000000000000000000000000000000..da28bbed193cef086fdce4be344e6843ceccf7bd --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/transforms/functional.py @@ -0,0 +1,25 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import cv2 +import numpy as np +from PIL import Image, ImageEnhance +from scipy.ndimage.morphology import distance_transform_edt + + +def normalize(im, mean, std): + im = im.astype(np.float32, copy=False) / 255.0 + im -= mean + im /= std + return im \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/transforms/transforms.py b/PaddleCV/3d_vision/SMOKE/smoke/transforms/transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..773a4eb431d3cadfbc3e578a07fe78eac18fa828 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/transforms/transforms.py @@ -0,0 +1,121 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/transforms/transforms.py +""" + +import random + +import cv2 +import numpy as np +from PIL import Image + +from . import functional +from smoke.cvlibs import manager + +@manager.TRANSFORMS.add_component +class Compose: + """ + Do transformation on input data with corresponding pre-processing and augmentation operations. + The shape of input data to all operations is [height, width, channels]. + + Args: + transforms (list): A list contains data pre-processing or augmentation. Empty list means only reading images, no transformation. + to_rgb (bool, optional): If converting image to RGB color space. Default: True. + + Raises: + TypeError: When 'transforms' is not a list. + ValueError: when the length of 'transforms' is less than 1. + """ + + def __init__(self, transforms, to_rgb=True): + if not isinstance(transforms, list): + raise TypeError('The transforms must be a list!') + self.transforms = transforms + self.to_rgb = to_rgb + + def __call__(self, im, label=None): + """ + Args: + im (str|np.ndarray): It is either image path or image object. + label (str|np.ndarray): It is either label path or label ndarray. + + Returns: + (tuple). A tuple including image, image info, and label after transformation. + """ + if isinstance(im, str): + im = cv2.imread(im).astype('float32') + if isinstance(label, str): + label = np.asarray(Image.open(label)) + if im is None: + raise ValueError('Can\'t read The image file {}!'.format(im)) + if self.to_rgb: + im = np.array(im) + im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) + + for op in self.transforms: + outputs = op(im, label) + im = outputs[0] + if len(outputs) == 2: + label = outputs[1] + + im = np.transpose(im, (2, 0, 1)) + + return (im, label) + +@manager.TRANSFORMS.add_component +class Normalize: + """ + Normalize an image. + + Args: + mean (list, optional): The mean value of a data set. Default: [0.5, 0.5, 0.5]. + std (list, optional): The standard deviation of a data set. Default: [0.5, 0.5, 0.5]. + + Raises: + ValueError: When mean/std is not list or any value in std is 0. + """ + + def __init__(self, mean, std): + self.mean = mean + self.std = std + if not (isinstance(self.mean, (list, tuple)) + and isinstance(self.std, (list, tuple))): + raise ValueError( + "{}: input type is invalid. It should be list or tuple".format( + self)) + from functools import reduce + if reduce(lambda x, y: x * y, self.std) == 0: + raise ValueError('{}: std is invalid!'.format(self)) + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): The Image data. + label (np.ndarray, optional): The label data. Default: None. + + Returns: + (tuple). When label is None, it returns (im, ), otherwise it returns (im, label). + """ + + mean = np.array(self.mean)[np.newaxis, np.newaxis, :] + std = np.array(self.std)[np.newaxis, np.newaxis, :] + im = functional.normalize(im, mean, std) + + if label is None: + return (im, ) + else: + return (im, label) \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/__init__.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..8c1bf0eca6e177834d4197e6caff0dba1d447eb8 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from .timer import TimeAverager, calculate_eta +from .pretrained_utils import load_pretrained_model diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/heatmap_coder.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/heatmap_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..eda4d5f12a6b48d57f6e185bcf7a4d4054f5ad62 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/heatmap_coder.py @@ -0,0 +1,154 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +from skimage import transform as trans + +def encode_label(K, ry, dims, locs): + """get bbox 3d and 2d by model output + + Args: + K (np.ndarray): camera intrisic matrix + ry (np.ndarray): rotation y + dims (np.ndarray): dimensions + locs (np.ndarray): locations + """ + l, h, w = dims[0], dims[1], dims[2] + x, y, z = locs[0], locs[1], locs[2] + + x_corners = [0, l, l, l, l, 0, 0, 0] + y_corners = [0, 0, h, h, 0, 0, h, h] + z_corners = [0, 0, 0, w, w, w, w, 0] + + x_corners += - np.float32(l) / 2 + y_corners += - np.float32(h) + z_corners += - np.float32(w) / 2 + + corners_3d = np.array([x_corners, y_corners, z_corners]) + rot_mat = np.array([[np.cos(ry), 0, np.sin(ry)], + [0, 1, 0], + [-np.sin(ry), 0, np.cos(ry)]]) + corners_3d = np.matmul(rot_mat, corners_3d) + corners_3d += np.array([x, y, z]).reshape([3, 1]) + + loc_center = np.array([x, y - h / 2, z]) + proj_point = np.matmul(K, loc_center) + proj_point = proj_point[:2] / proj_point[2] + + corners_2d = np.matmul(K, corners_3d) + corners_2d = corners_2d[:2] / corners_2d[2] + box2d = np.array([min(corners_2d[0]), min(corners_2d[1]), + max(corners_2d[0]), max(corners_2d[1])]) + + return proj_point, box2d, corners_3d + +def get_transfrom_matrix(center_scale, output_size): + """get transform matrix + """ + center, scale = center_scale[0], center_scale[1] + # todo: further add rot and shift here. + src_w = scale[0] + dst_w = output_size[0] + dst_h = output_size[1] + + src_dir = np.array([0, src_w * -0.5]) + dst_dir = np.array([0, dst_w * -0.5]) + + src = np.zeros((3, 2), dtype=np.float32) + dst = np.zeros((3, 2), dtype=np.float32) + src[0, :] = center + src[1, :] = center + src_dir + dst[0, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir + + src[2, :] = get_3rd_point(src[0, :], src[1, :]) + dst[2, :] = get_3rd_point(dst[0, :], dst[1, :]) + + get_matrix = trans.estimate_transform("affine", src, dst) + matrix = get_matrix.params + + return matrix.astype(np.float32) + + +def affine_transform(point, matrix): + """do affine transform to label + """ + point_exd = np.array([point[0], point[1], 1.]) + new_point = np.matmul(matrix, point_exd) + + return new_point[:2] + + +def get_3rd_point(point_a, point_b): + """get 3rd point + """ + d = point_a - point_b + point_c = point_b + np.array([-d[1], d[0]]) + return point_c + + +def gaussian_radius(h, w, thresh_min=0.7): + """gaussian radius + """ + a1 = 1 + b1 = h + w + c1 = h * w * (1 - thresh_min) / (1 + thresh_min) + sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1) + r1 = (b1 - sq1) / (2 * a1) + + a2 = 4 + b2 = 2 * (h + w) + c2 = (1 - thresh_min) * w * h + sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2) + r2 = (b2 - sq2) / (2 * a2) + + a3 = 4 * thresh_min + b3 = -2 * thresh_min * (h + w) + c3 = (thresh_min - 1) * w * h + sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3) + r3 = (b3 + sq3) / (2 * a3) + + return min(r1, r2, r3) + + +def gaussian2D(shape, sigma=1): + """get 2D gaussian map + """ + m, n = [(ss - 1.) / 2. for ss in shape] + y, x = np.ogrid[-m:m + 1, -n:n + 1] + + h = np.exp(-(x * x + y * y) / (2 * sigma * sigma)) + h[h < np.finfo(h.dtype).eps * h.max()] = 0 + return h + + +def draw_umich_gaussian(heatmap, center, radius, k=1): + """draw umich gaussian + """ + diameter = 2 * radius + 1 + gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) + + x, y = int(center[0]), int(center[1]) + + height, width = heatmap.shape[0:2] + + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right] + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: + np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) + + return heatmap diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/logger.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/logger.py new file mode 100644 index 0000000000000000000000000000000000000000..d4a753a152217d03afc108f47569bab96865d2d9 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/logger.py @@ -0,0 +1,54 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/utils/logger.py +""" + +import sys +import time + +import paddle + +levels = {0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'} +log_level = 2 + + +def log(level=2, message=""): + if paddle.distributed.ParallelEnv().local_rank == 0: + current_time = time.time() + time_array = time.localtime(current_time) + current_time = time.strftime("%Y-%m-%d %H:%M:%S", time_array) + if log_level >= level: + print( + "{} [{}]\t{}".format(current_time, levels[level], + message).encode("utf-8").decode("latin1")) + sys.stdout.flush() + + +def debug(message=""): + log(level=3, message=message) + + +def info(message=""): + log(level=2, message=message) + + +def warning(message=""): + log(level=1, message=message) + + +def error(message=""): + log(level=0, message=message) diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/miscellaneous.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/miscellaneous.py new file mode 100644 index 0000000000000000000000000000000000000000..1e6c2ab0dc7a37030fcbac7e4d2313e4c43a6e90 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/miscellaneous.py @@ -0,0 +1,30 @@ + +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import errno +import os + + +def mkdir(path): + """make new dir + + Args: + path (str): path of new dir to make + """ + try: + os.makedirs(path) + except OSError as e: + if e.errno != errno.EEXIST: + raise \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/pretrained_utils.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/pretrained_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..30d785807c7e856de90a0b73ee6999cd47a7faba --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/pretrained_utils.py @@ -0,0 +1,92 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/utils/utils.py +""" + +import contextlib +import filelock +import math +import os +import tempfile +from urllib.parse import urlparse, unquote + +import paddle + +from smoke.utils import logger + + +@contextlib.contextmanager +def generate_tempdir(directory: str = None, **kwargs): + '''Generate a temporary directory''' + directory = seg_env.TMP_HOME if not directory else directory + with tempfile.TemporaryDirectory(dir=directory, **kwargs) as _dir: + yield _dir + + + +def load_pretrained_model(model, pretrained_model): + + if os.path.exists(pretrained_model): + para_state_dict = paddle.load(pretrained_model) + + model_state_dict = model.state_dict() + keys = model_state_dict.keys() + num_params_loaded = 0 + for k in keys: + if k not in para_state_dict: + logger.warning("{} is not in pretrained model".format(k)) + elif list(para_state_dict[k].shape) != list( + model_state_dict[k].shape): + logger.warning( + "[SKIP] Shape of pretrained params {} doesn't match.(Pretrained: {}, Actual: {})" + .format(k, para_state_dict[k].shape, + model_state_dict[k].shape)) + else: + model_state_dict[k] = para_state_dict[k] + num_params_loaded += 1 + model.set_dict(model_state_dict) + logger.info("There are {}/{} variables loaded into {}.".format( + num_params_loaded, len(model_state_dict), + model.__class__.__name__)) + + else: + raise ValueError( + 'The pretrained model directory is not Found: {}'.format( + pretrained_model)) + + +def resume(model, optimizer, resume_model): + if resume_model is not None: + logger.info('Resume model from {}'.format(resume_model)) + if os.path.exists(resume_model): + resume_model = os.path.normpath(resume_model) + ckpt_path = os.path.join(resume_model, 'model.pdparams') + para_state_dict = paddle.load(ckpt_path) + ckpt_path = os.path.join(resume_model, 'model.pdopt') + opti_state_dict = paddle.load(ckpt_path) + model.set_state_dict(para_state_dict) + optimizer.set_state_dict(opti_state_dict) + + iter = resume_model.split('_')[-1] + iter = int(iter) + return iter + else: + raise ValueError( + 'Directory of the model needed to resume is not Found: {}'. + format(resume_model)) + else: + logger.info('No model needed to resume.') diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/progbar.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/progbar.py new file mode 100644 index 0000000000000000000000000000000000000000..a1d33bb810c2ff27571269c80cae108b0c0005ac --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/progbar.py @@ -0,0 +1,214 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/utils/progbar.py +""" + +import os +import sys +import time + +import numpy as np + + +class Progbar(object): + """ + Displays a progress bar. + It refers to https://github.com/keras-team/keras/blob/keras-2/keras/utils/generic_utils.py + + Args: + target (int): Total number of steps expected, None if unknown. + width (int): Progress bar width on screen. + verbose (int): Verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose) + stateful_metrics (list|tuple): Iterable of string names of metrics that should *not* be + averaged over time. Metrics in this list will be displayed as-is. All + others will be averaged by the progbar before display. + interval (float): Minimum visual progress update interval (in seconds). + unit_name (str): Display name for step counts (usually "step" or "sample"). + """ + + def __init__(self, + target, + width=30, + verbose=1, + interval=0.05, + stateful_metrics=None, + unit_name='step'): + self.target = target + self.width = width + self.verbose = verbose + self.interval = interval + self.unit_name = unit_name + if stateful_metrics: + self.stateful_metrics = set(stateful_metrics) + else: + self.stateful_metrics = set() + + self._dynamic_display = ((hasattr(sys.stderr, 'isatty') + and sys.stderr.isatty()) + or 'ipykernel' in sys.modules + or 'posix' in sys.modules + or 'PYCHARM_HOSTED' in os.environ) + self._total_width = 0 + self._seen_so_far = 0 + # We use a dict + list to avoid garbage collection + # issues found in OrderedDict + self._values = {} + self._values_order = [] + self._start = time.time() + self._last_update = 0 + + def update(self, current, values=None, finalize=None): + """ + Updates the progress bar. + + Args: + current (int): Index of current step. + values (list): List of tuples: `(name, value_for_last_step)`. If `name` is in + `stateful_metrics`, `value_for_last_step` will be displayed as-is. + Else, an average of the metric over time will be displayed. + finalize (bool): Whether this is the last update for the progress bar. If + `None`, defaults to `current >= self.target`. + """ + + if finalize is None: + if self.target is None: + finalize = False + else: + finalize = current >= self.target + + values = values or [] + for k, v in values: + if k not in self._values_order: + self._values_order.append(k) + if k not in self.stateful_metrics: + # In the case that progress bar doesn't have a target value in the first + # epoch, both on_batch_end and on_epoch_end will be called, which will + # cause 'current' and 'self._seen_so_far' to have the same value. Force + # the minimal value to 1 here, otherwise stateful_metric will be 0s. + value_base = max(current - self._seen_so_far, 1) + if k not in self._values: + self._values[k] = [v * value_base, value_base] + else: + self._values[k][0] += v * value_base + self._values[k][1] += value_base + else: + # Stateful metrics output a numeric value. This representation + # means "take an average from a single value" but keeps the + # numeric formatting. + self._values[k] = [v, 1] + self._seen_so_far = current + + now = time.time() + info = ' - %.0fs' % (now - self._start) + if self.verbose == 1: + if now - self._last_update < self.interval and not finalize: + return + + prev_total_width = self._total_width + if self._dynamic_display: + sys.stderr.write('\b' * prev_total_width) + sys.stderr.write('\r') + else: + sys.stderr.write('\n') + + if self.target is not None: + numdigits = int(np.log10(self.target)) + 1 + bar = ('%' + str(numdigits) + 'd/%d [') % (current, self.target) + prog = float(current) / self.target + prog_width = int(self.width * prog) + if prog_width > 0: + bar += ('=' * (prog_width - 1)) + if current < self.target: + bar += '>' + else: + bar += '=' + bar += ('.' * (self.width - prog_width)) + bar += ']' + else: + bar = '%7d/Unknown' % current + + self._total_width = len(bar) + sys.stderr.write(bar) + + if current: + time_per_unit = (now - self._start) / current + else: + time_per_unit = 0 + + if self.target is None or finalize: + if time_per_unit >= 1 or time_per_unit == 0: + info += ' %.0fs/%s' % (time_per_unit, self.unit_name) + elif time_per_unit >= 1e-3: + info += ' %.0fms/%s' % (time_per_unit * 1e3, self.unit_name) + else: + info += ' %.0fus/%s' % (time_per_unit * 1e6, self.unit_name) + else: + eta = time_per_unit * (self.target - current) + if eta > 3600: + eta_format = '%d:%02d:%02d' % (eta // 3600, + (eta % 3600) // 60, eta % 60) + elif eta > 60: + eta_format = '%d:%02d' % (eta // 60, eta % 60) + else: + eta_format = '%ds' % eta + + info = ' - ETA: %s' % eta_format + + for k in self._values_order: + info += ' - %s:' % k + if isinstance(self._values[k], list): + avg = np.mean( + self._values[k][0] / max(1, self._values[k][1])) + if abs(avg) > 1e-3: + info += ' %.4f' % avg + else: + info += ' %.4e' % avg + else: + info += ' %s' % self._values[k] + + self._total_width += len(info) + if prev_total_width > self._total_width: + info += (' ' * (prev_total_width - self._total_width)) + + if finalize: + info += '\n' + + sys.stderr.write(info) + sys.stderr.flush() + + elif self.verbose == 2: + if finalize: + numdigits = int(np.log10(self.target)) + 1 + count = ('%' + str(numdigits) + 'd/%d') % (current, self.target) + info = count + info + for k in self._values_order: + info += ' - %s:' % k + avg = np.mean( + self._values[k][0] / max(1, self._values[k][1])) + if avg > 1e-3: + info += ' %.4f' % avg + else: + info += ' %.4e' % avg + info += '\n' + + sys.stderr.write(info) + sys.stderr.flush() + + self._last_update = now + + def add(self, n, values=None): + self.update(self._seen_so_far + n, values) diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/timer.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/timer.py new file mode 100644 index 0000000000000000000000000000000000000000..6431b343b435369be7ee0a875eafd528211df5db --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/timer.py @@ -0,0 +1,58 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/paddleseg/utils/timer.py +""" + +import time + + +class TimeAverager(object): + def __init__(self): + self.reset() + + def reset(self): + self._cnt = 0 + self._total_time = 0 + self._total_samples = 0 + + def record(self, usetime, num_samples=None): + self._cnt += 1 + self._total_time += usetime + if num_samples: + self._total_samples += num_samples + + def get_average(self): + if self._cnt == 0: + return 0 + return self._total_time / float(self._cnt) + + def get_ips_average(self): + if not self._total_samples or self._cnt == 0: + return 0 + return float(self._total_samples) / self._total_time + + +def calculate_eta(remaining_step, speed): + if remaining_step < 0: + remaining_step = 0 + remaining_time = int(remaining_step * speed) + result = "{:0>2}:{:0>2}:{:0>2}" + arr = [] + for i in range(2, -1, -1): + arr.append(int(remaining_time / 60**i)) + remaining_time %= 60**i + return result.format(*arr) diff --git a/PaddleCV/3d_vision/SMOKE/smoke/utils/vis_utils.py b/PaddleCV/3d_vision/SMOKE/smoke/utils/vis_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..fbf48fcb8f17a483c2cea349ffa651d9caf592bb --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/smoke/utils/vis_utils.py @@ -0,0 +1,130 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +import cv2 +import numpy as np + +from smoke.ops import gather_op + + +def get_ratio(ori_img_size, output_size, down_ratio=(4, 4)): + return np.array([[down_ratio[1] * ori_img_size[1] / output_size[1], + down_ratio[0] * ori_img_size[0] / output_size[0]]], np.float32) + +def get_img(img_path): + img = cv2.imread(img_path) + ori_img_size = img.shape + img = cv2.resize(img, (960, 640)) + output_size = img.shape + img = img/255.0 + img = np.subtract(img, np.array([0.485, 0.456, 0.406])) + img = np.true_divide(img, np.array([0.229, 0.224, 0.225])) + img = np.array(img, np.float32) + img = img.transpose(2, 0, 1) + img = img[None,:,:,:] + img = paddle.to_tensor(img) + return img, ori_img_size, output_size + +def encode_box3d(rotys, dims, locs, K, image_size): + ''' + construct 3d bounding box for each object. + Args: + rotys: rotation in shape N + dims: dimensions of objects + locs: locations of objects + + Returns: + box_3d in camera frame, shape(b, 2, 8) + ''' + if len(rotys.shape) == 2: + rotys = rotys.flatten() + if len(dims.shape) == 3: + dims = paddle.reshape(dims, (-1, 3)) + if len(locs.shape) == 3: + locs = paddle.reshape(locs, (-1, 3)) + + N = rotys.shape[0] + ry = rad_to_matrix(rotys, N) + + dims = paddle.reshape(dims, (-1, 1)).tile((1, 8)) + dims[::3, :4], dims[2::3, :4] = 0.5 * dims[::3, :4], 0.5 * dims[2::3, :4] + dims[::3, 4:], dims[2::3, 4:] = -0.5 * dims[::3, 4:], -0.5 * dims[2::3, 4:] + dims[1::3, :4], dims[1::3, 4:] = 0., -dims[1::3, 4:] + index = paddle.to_tensor([[4, 0, 1, 2, 3, 5, 6, 7], + [4, 5, 0, 1, 6, 7, 2, 3], + [4, 5, 6, 0, 1, 2, 3, 7]]).tile((N, 1)) + + box_3d_object = gather_op(dims, 1, index) + box_3d = paddle.matmul(ry, paddle.reshape(box_3d_object, (N, 3, -1))) + box_3d += locs.unsqueeze(-1).tile((1, 1, 8)) + + box3d_image = paddle.matmul(K, box_3d) + box3d_image = box3d_image[:, :2, :] / paddle.reshape(box3d_image[:, 2, :], (box_3d.shape[0], 1, box_3d.shape[2])) + box3d_image = box3d_image.astype("int32") + box3d_image = box3d_image.astype("float32") + + box3d_image[:, 0] = box3d_image[:, 0].clip(0, image_size[1]) + box3d_image[:, 1] = box3d_image[:, 1].clip(0, image_size[0]) + + return box3d_image + +def rad_to_matrix(rotys, N): + + cos, sin = rotys.cos(), rotys.sin() + + i_temp = paddle.to_tensor([[1, 0, 1], + [0, 1, 0], + [-1, 0, 1]]).astype("float32") + + ry = paddle.reshape(i_temp.tile((N, 1)), (N, -1, 3)) + + ry[:, 0, 0] *= cos + ry[:, 0, 2] *= sin + ry[:, 2, 0] *= sin + ry[:, 2, 2] *= cos + + return ry + + +def draw_box_3d(image, corners, color=None): + ''' Draw 3d bounding box in image + corners: (8,2) array of vertices for the 3d box in following order: + ''' + + # face_idx = [[0, 1, 5, 4], + # [1, 2, 6, 5], + # [2, 3, 7, 6], + # [3, 0, 4, 7]] + if color is None: + color = (0, 0, 255) + face_idx = [[5, 4, 3, 6], + [1, 2, 3, 4], + [1, 0, 7, 2], + [0, 5, 6, 7]] + for ind_f in range(3, -1, -1): + f = face_idx[ind_f] + for j in range(4): + cv2.line(image, (corners[f[j], 0], corners[f[j], 1]), + (corners[f[(j + 1) % 4], 0], corners[f[(j + 1) % 4], 1]), color, 2, lineType=cv2.LINE_AA) + if ind_f == 0: + cv2.line(image, (corners[f[0], 0], corners[f[0], 1]), + (corners[f[2], 0], corners[f[2], 1]), color, 1, lineType=cv2.LINE_AA) + cv2.line(image, (corners[f[1], 0], corners[f[1], 1]), + (corners[f[3], 0], corners[f[3], 1]), color, 1, lineType=cv2.LINE_AA) + + return image \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/test.py b/PaddleCV/3d_vision/SMOKE/test.py new file mode 100644 index 0000000000000000000000000000000000000000..d1135c327aaac112cef8ec353ec8c1c47b8bc2e2 --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/test.py @@ -0,0 +1,100 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os + +import cv2 +import numpy as np +import paddle + +from smoke.cvlibs import Config +from smoke.utils import logger, load_pretrained_model +from smoke.utils.vis_utils import get_img, get_ratio, encode_box3d, draw_box_3d + +def parse_args(): + parser = argparse.ArgumentParser(description='Model test') + + # params of evaluate + parser.add_argument( + "--config", dest="cfg", help="The config file.", required=True, type=str) + parser.add_argument( + '--model_path', + dest='model_path', + help='The path of model for evaluation', + type=str, + required=True) + parser.add_argument( + '--input_path', + dest='input_path', + help='The image path', + type=str, + required=True) + parser.add_argument( + '--output_path', + dest='output_path', + help='The result path of image', + type=str, + required=True) + + + return parser.parse_args() + + +def main(args): + + paddle.set_device("gpu") + + cfg = Config(args.cfg) + + model = cfg.model + model.eval() + if args.model_path: + load_pretrained_model(model, args.model_path) + logger.info('Loaded trained params of model successfully') + K = np.array([[[2055.56, 0, 939.658], [0, 2055.56, 641.072], [0, 0, 1]]], np.float32) + K_inverse = np.linalg.inv(K) + K_inverse = paddle.to_tensor(K_inverse) + + img, ori_img_size, output_size = get_img(args.input_path) + + ratio = get_ratio(ori_img_size, output_size) + ratio = paddle.to_tensor(ratio) + cam_info = [K_inverse, ratio] + total_pred = model(img, cam_info) + + keep_idx = paddle.nonzero(total_pred[:, -1] > 0.25) + total_pred = paddle.gather(total_pred, keep_idx) + + if total_pred.shape[0] > 0: + pred_dimensions = total_pred[:, 6:9] + pred_dimensions = pred_dimensions.roll(shifts=1, axis=1) + pred_rotys = total_pred[:, 12] + pred_locations = total_pred[:, 9:12] + bbox_3d = encode_box3d(pred_rotys, pred_dimensions, pred_locations, paddle.to_tensor(K), (1280, 1920)) + else: + bbox_3d = total_pred + + img_draw = cv2.imread(args.input_path) + for idx in range(bbox_3d.shape[0]): + bbox = bbox_3d[idx] + bbox = bbox.transpose([1,0]).numpy() + img_draw = draw_box_3d(img_draw, bbox) + + cv2.imwrite(args.output_path, img_draw) + + +if __name__ == '__main__': + args = parse_args() + main(args) diff --git a/PaddleCV/3d_vision/SMOKE/tools/kitti_eval_offline/evaluate_object_offline_40p.cpp b/PaddleCV/3d_vision/SMOKE/tools/kitti_eval_offline/evaluate_object_offline_40p.cpp new file mode 100644 index 0000000000000000000000000000000000000000..a830e0a9c31d6fa36952cb9b8c69408a0e9697cb --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/tools/kitti_eval_offline/evaluate_object_offline_40p.cpp @@ -0,0 +1,959 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include + +#include +#include +#include +#include + +#include "mail.h" + +BOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian) + +typedef boost::geometry::model::polygon > Polygon; + + +using namespace std; + +/*======================================================================= +STATIC EVALUATION PARAMETERS +=======================================================================*/ + +// holds the number of test images on the server +const int32_t N_TESTIMAGES = 7518; +//const int32_t N_TESTIMAGES = 7480; + +// easy, moderate and hard evaluation level +enum DIFFICULTY{EASY=0, MODERATE=1, HARD=2}; + +// evaluation metrics: image, ground or 3D +enum METRIC{IMAGE=0, GROUND=1, BOX3D=2}; + +// evaluation parameter +const int32_t MIN_HEIGHT[3] = {40, 25, 25}; // minimum height for evaluated groundtruth/detections +const int32_t MAX_OCCLUSION[3] = {0, 1, 2}; // maximum occlusion level of the groundtruth used for evaluation +const double MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation + +// evaluated object classes +enum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2}; +const int NUM_CLASS = 3; + +// parameters varying per class +vector CLASS_NAMES; +vector CLASS_NAMES_CAP; +// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation +const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}}; + +// no. of recall steps that should be evaluated (discretized) +const double N_SAMPLE_PTS = 41; + +// initialize class names +void initGlobals () { + CLASS_NAMES.push_back("car"); + CLASS_NAMES.push_back("pedestrian"); + CLASS_NAMES.push_back("cyclist"); + CLASS_NAMES_CAP.push_back("Car"); + CLASS_NAMES_CAP.push_back("Pedestrian"); + CLASS_NAMES_CAP.push_back("Cyclist"); +} + +/*======================================================================= +DATA TYPES FOR EVALUATION +=======================================================================*/ + +// holding data needed for precision-recall and precision-aos +struct tPrData { + vector v; // detection score for computing score thresholds + double similarity; // orientation similarity + int32_t tp; // true positives + int32_t fp; // false positives + int32_t fn; // false negatives + tPrData () : + similarity(0), tp(0), fp(0), fn(0) {} +}; + +// holding bounding boxes for ground truth and detections +struct tBox { + string type; // object type as car, pedestrian or cyclist,... + double x1; // left corner + double y1; // top corner + double x2; // right corner + double y2; // bottom corner + double alpha; // image orientation + tBox (string type, double x1,double y1,double x2,double y2,double alpha) : + type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {} +}; + +// holding ground truth data +struct tGroundtruth { + tBox box; // object type, box, orientation + double truncation; // truncation 0..1 + int32_t occlusion; // occlusion 0,1,2 (non, partly, fully) + double ry; + double t1, t2, t3; + double h, w, l; + tGroundtruth () : + box(tBox("invalild",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {} + tGroundtruth (tBox box,double truncation,int32_t occlusion) : + box(box),truncation(truncation),occlusion(occlusion) {} + tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) : + box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {} +}; + +// holding detection data +struct tDetection { + tBox box; // object type, box, orientation + double thresh; // detection score + double ry; + double t1, t2, t3; + double h, w, l; + tDetection (): + box(tBox("invalid",-1,-1,-1,-1,-10)),thresh(-1000) {} + tDetection (tBox box,double thresh) : + box(box),thresh(thresh) {} + tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) : + box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {} +}; + + +/*======================================================================= +FUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS +=======================================================================*/ +vector indices; + +vector loadDetections(string file_name, bool &compute_aos, + vector &eval_image, vector &eval_ground, + vector &eval_3d, bool &success) { + + // holds all detections (ignored detections are indicated by an index vector + vector detections; + FILE *fp = fopen(file_name.c_str(),"r"); + if (!fp) { + success = false; + return detections; + } + while (!feof(fp)) { + tDetection d; + double trash; + char str[255]; + if (fscanf(fp, "%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf", + str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1, + &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3, + &d.ry, &d.thresh)==16) { + + // d.thresh = 1; + d.box.type = str; + detections.push_back(d); + + // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid + if(d.box.alpha == -10) + compute_aos = false; + + // a class is only evaluated if it is detected at least once + for (int c = 0; c < NUM_CLASS; c++) { + if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str()) || !strcasecmp(d.box.type.c_str(), CLASS_NAMES_CAP[c].c_str())) { + if (!eval_image[c] && d.box.x1 >= 0) + eval_image[c] = true; + if (!eval_ground[c] && d.t1 != -1000 && d.t3 != -1000 && d.w > 0 && d.l > 0) + eval_ground[c] = true; + if (!eval_3d[c] && d.t1 != -1000 && d.t2 != -1000 && d.t3 != -1000 && d.h > 0 && d.w > 0 && d.l > 0) + eval_3d[c] = true; + break; + } + } + } + } + + fclose(fp); + success = true; + return detections; +} + +vector loadGroundtruth(string file_name,bool &success) { + + // holds all ground truth (ignored ground truth is indicated by an index vector + vector groundtruth; + FILE *fp = fopen(file_name.c_str(),"r"); + if (!fp) { + success = false; + return groundtruth; + } + while (!feof(fp)) { + tGroundtruth g; + char str[255]; + if (fscanf(fp, "%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf", + str, &g.truncation, &g.occlusion, &g.box.alpha, + &g.box.x1, &g.box.y1, &g.box.x2, &g.box.y2, + &g.h, &g.w, &g.l, &g.t1, + &g.t2, &g.t3, &g.ry )==15) { + g.box.type = str; + groundtruth.push_back(g); + } + } + fclose(fp); + success = true; + return groundtruth; +} + +void saveStats (const vector &precision, const vector &aos, FILE *fp_det, FILE *fp_ori) { + + // save precision to file + if(precision.empty()) + return; + for (int32_t i=0; i +Polygon toPolygon(const T& g) { + using namespace boost::numeric::ublas; + using namespace boost::geometry; + matrix mref(2, 2); + mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry); + mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry); + + static int count = 0; + matrix corners(2, 4); + double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2, + g.w / 2, -g.w / 2, -g.w / 2, g.w / 2}; + std::copy(data, data + 8, corners.data().begin()); + matrix gc = prod(mref, corners); + for (int i = 0; i < 4; ++i) { + gc(0, i) += g.t1; + gc(1, i) += g.t3; + } + + double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}}; + Polygon poly; + append(poly, points); + return poly; +} + +// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz) +inline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) { + using namespace boost::geometry; + Polygon gp = toPolygon(g); + Polygon dp = toPolygon(d); + + std::vector in, un; + intersection(gp, dp, in); + union_(gp, dp, un); + + double inter_area = in.empty() ? 0 : area(in.front()); + double union_area = area(un.front()); + double o; + if(criterion==-1) // union + o = inter_area / union_area; + else if(criterion==0) // bbox_a + o = inter_area / area(dp); + else if(criterion==1) // bbox_b + o = inter_area / area(gp); + + return o; +} + +// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz) +inline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) { + using namespace boost::geometry; + Polygon gp = toPolygon(g); + Polygon dp = toPolygon(d); + + std::vector in, un; + intersection(gp, dp, in); + union_(gp, dp, un); + + double ymax = min(d.t2, g.t2); + double ymin = max(d.t2 - d.h, g.t2 - g.h); + + double inter_area = in.empty() ? 0 : area(in.front()); + double inter_vol = inter_area * max(0.0, ymax - ymin); + + double det_vol = d.h * d.l * d.w; + double gt_vol = g.h * g.l * g.w; + + double o; + if(criterion==-1) // union + o = inter_vol / (det_vol + gt_vol - inter_vol); + else if(criterion==0) // bbox_a + o = inter_vol / det_vol; + else if(criterion==1) // bbox_b + o = inter_vol / gt_vol; + + return o; +} + +vector getThresholds(vector &v, double n_groundtruth){ + + // holds scores needed to compute N_SAMPLE_PTS recall values + vector t; + + // sort scores in descending order + // (highest score is assumed to give best/most confident detections) + sort(v.begin(), v.end(), greater()); + + // get scores for linearly spaced recall + double current_recall = 0; + for(int32_t i=0; i >, const vector &det, vector &ignored_gt, vector &dc, vector &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){ + + // extract ground truth bounding boxes for current evaluation class + for(int32_t i=0;iMAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<=MIN_HEIGHT[difficulty]) + ignore = true; + + // set ignored vector for ground truth + // current class and not ignored (total no. of ground truth is detected for recall denominator) + if(valid_class==1 && !ignore){ + ignored_gt.push_back(0); + n_gt++; + } + + // neighboring class, or current class but ignored + else if(valid_class==0 || (ignore && valid_class==1)) + ignored_gt.push_back(1); + + // all other classes which are FN in the evaluation + else + ignored_gt.push_back(-1); + } + + // extract dontcare areas + for(int32_t i=0;i >, + const vector &det, const vector &dc, + const vector &ignored_gt, const vector &ignored_det, + bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t), + METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){ + + tPrData stat = tPrData(); + const double NO_DETECTION = -10000000; + vector delta; // holds angular difference for TPs (needed for AOS evaluation) + vector assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth + assigned_detection.assign(det.size(), false); + vector ignored_threshold; + ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed + + // detections with a low score are ignored for computing precision (needs FP) + if(compute_fp) + for(int32_t i=0; i 0.5) (logical len(det)) + =======================================================================*/ + int32_t det_idx = -1; + double valid_detection = NO_DETECTION; + double max_overlap = 0; + + // search for a possible detection + bool assigned_ignored_det = false; + for(int32_t j=0; jMIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){ + det_idx = j; + valid_detection = det[j].thresh; + } + + // for computing pr curve values, the candidate with the greatest overlap is considered + // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used + else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){ + max_overlap = overlap; + det_idx = j; + valid_detection = 1; + assigned_ignored_det = false; + } + else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){ + det_idx = j; + valid_detection = 1; + assigned_ignored_det = true; + } + } + + /*======================================================================= + compute TP, FP and FN + =======================================================================*/ + + // nothing was assigned to this valid ground truth + if(valid_detection==NO_DETECTION && ignored_gt[i]==0) { + stat.fn++; + } + + // only evaluate valid ground truth <=> detection assignments (considering difficulty level) + else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1)) + assigned_detection[det_idx] = true; + + // found a valid true positive + else if(valid_detection!=NO_DETECTION){ + + // write highest score to threshold vector + stat.tp++; + stat.v.push_back(det[det_idx].thresh); + + // compute angular difference of detection and ground truth if valid detection orientation was provided + if(compute_aos) + delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha); + + // clean up + assigned_detection[det_idx] = true; + } + } + + // if FP are requested, consider stuff area + if(compute_fp){ + + // count fp + for(int32_t i=0; iMIN_OVERLAP[metric][current_class]){ + assigned_detection[j] = true; + nstuff++; + } + } + } + + // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas + stat.fp -= nstuff; + + // if all orientation values are valid, the AOS is computed + if(compute_aos){ + vector tmp; + + // FP have a similarity of 0, for all TP compute AOS + tmp.assign(stat.fp, 0); + for(int32_t i=0; i0 || stat.fp>0) + stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0); + + // there was neither a FP nor a TP, so the similarity is ignored in the evaluation + else + stat.similarity = -1; + } + } + return stat; +} + +/*======================================================================= +EVALUATE CLASS-WISE +=======================================================================*/ + +bool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class, + const vector< vector > &groundtruth, + const vector< vector > &detections, bool compute_aos, + double (*boxoverlap)(tDetection, tGroundtruth, int32_t), + vector &precision, vector &aos, + DIFFICULTY difficulty, METRIC metric) { + assert(groundtruth.size() == detections.size()); + + // init + int32_t n_gt=0; // total no. of gt (denominator of recall) + vector v, thresholds; // detection scores, evaluated for recall discretization + vector< vector > ignored_gt, ignored_det; // index of ignored gt detection for current class/difficulty + vector< vector > dontcare; // index of dontcare areas, included in ground truth + + // for all test images do + for (int32_t i=0; i i_gt, i_det; + vector dc; + + // only evaluate objects of current class and ignore occluded, truncated objects + cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty); + ignored_gt.push_back(i_gt); + ignored_det.push_back(i_det); + dontcare.push_back(dc); + + // compute statistics to get recall values + tPrData pr_tmp = tPrData(); + pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric); + + // add detection scores to vector over all images + for(int32_t j=0; j pr; + pr.assign(thresholds.size(),tPrData()); + for (int32_t i=0; i recall; + precision.assign(N_SAMPLE_PTS, 0); + if(compute_aos) + aos.assign(N_SAMPLE_PTS, 0); + double r=0; + for (int32_t i=0; i vals[],bool is_aos){ + + char command[1024]; + + // save plot data to file + FILE *fp = fopen((dir_name + "/" + file_name + ".txt").c_str(),"w"); + printf("save %s\n", (dir_name + "/" + file_name + ".txt").c_str()); + for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++) + fprintf(fp,"%f %f %f %f\n",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]); + fclose(fp); + + float sum[3] = {0, 0, 0}; + for (int v = 0; v < 3; ++v) + for (int i = 1; i < vals[v].size(); i = i + 1) + sum[v] += vals[v][i]; + printf("%s AP: %f %f %f\n", file_name.c_str(), sum[0] / 40 * 100, sum[1] / 40 * 100, sum[2] / 40 * 100); + + + // create png + eps + for (int32_t j=0; j<2; j++) { + + // open file + FILE *fp = fopen((dir_name + "/" + file_name + ".gp").c_str(),"w"); + + // save gnuplot instructions + if (j==0) { + fprintf(fp,"set term png size 450,315 font \"Helvetica\" 11\n"); + fprintf(fp,"set output \"%s.png\"\n",file_name.c_str()); + } else { + fprintf(fp,"set term postscript eps enhanced color font \"Helvetica\" 20\n"); + fprintf(fp,"set output \"%s.eps\"\n",file_name.c_str()); + } + + // set labels and ranges + fprintf(fp,"set size ratio 0.7\n"); + fprintf(fp,"set xrange [0:1]\n"); + fprintf(fp,"set yrange [0:1]\n"); + fprintf(fp,"set xlabel \"Recall\"\n"); + if (!is_aos) fprintf(fp,"set ylabel \"Precision\"\n"); + else fprintf(fp,"set ylabel \"Orientation Similarity\"\n"); + obj_type[0] = toupper(obj_type[0]); + fprintf(fp,"set title \"%s\"\n",obj_type.c_str()); + + // line width + int32_t lw = 5; + if (j==0) lw = 3; + + // plot error curve + fprintf(fp,"plot "); + fprintf(fp,"\"%s.txt\" using 1:2 title 'Easy' with lines ls 1 lw %d,",file_name.c_str(),lw); + fprintf(fp,"\"%s.txt\" using 1:3 title 'Moderate' with lines ls 2 lw %d,",file_name.c_str(),lw); + fprintf(fp,"\"%s.txt\" using 1:4 title 'Hard' with lines ls 3 lw %d",file_name.c_str(),lw); + + // close file + fclose(fp); + + // run gnuplot => create png + eps + sprintf(command,"cd %s; gnuplot %s",dir_name.c_str(),(file_name + ".gp").c_str()); + system(command); + } + + // create pdf and crop + sprintf(command,"cd %s; ps2pdf %s.eps %s_large.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str()); + system(command); + sprintf(command,"cd %s; pdfcrop %s_large.pdf %s.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str()); + system(command); + sprintf(command,"cd %s; rm %s_large.pdf",dir_name.c_str(),file_name.c_str()); + system(command); +} + +vector getEvalIndices(const string& result_dir) { + + DIR* dir; + dirent* entity; + dir = opendir(result_dir.c_str()); + if (dir) { + while (entity = readdir(dir)) { + string path(entity->d_name); + int32_t len = path.size(); + if (len < 10) continue; + int32_t index = atoi(path.substr(len - 10, 10).c_str()); + indices.push_back(index); + } + } + return indices; +} + +bool eval(string gt_dir, string result_dir, Mail* mail){ + + // set some global parameters + initGlobals(); + + // ground truth and result directories +// string gt_dir = "data/object/label_2"; +// string result_dir = "results/" + result_sha; + string plot_dir = result_dir + "/plot"; + + // create output directories + system(("mkdir " + plot_dir).c_str()); + + // hold detections and ground truth in memory + vector< vector > groundtruth; + vector< vector > detections; + + // holds wether orientation similarity shall be computed (might be set to false while loading detections) + // and which labels where provided by this submission + bool compute_aos=true; + vector eval_image(NUM_CLASS, false); + vector eval_ground(NUM_CLASS, false); + vector eval_3d(NUM_CLASS, false); + + // for all images read groundtruth and detections + mail->msg("Loading detections..."); + std::vector indices = getEvalIndices(result_dir + "/data/"); + printf("number of files for evaluation: %d\n", (int)indices.size()); + + for (int32_t i=0; i gt = loadGroundtruth(gt_dir + "/" + file_name,gt_success); + vector det = loadDetections(result_dir + "/data/" + file_name, + compute_aos, eval_image, eval_ground, eval_3d, det_success); + groundtruth.push_back(gt); + detections.push_back(det); + + // check for errors + if (!gt_success) { + mail->msg("ERROR: Couldn't read: %s of ground truth. Please write me an email!", file_name); + return false; + } + if (!det_success) { + mail->msg("ERROR: Couldn't read: %s", file_name); + return false; + } + } + mail->msg(" done."); + + // holds pointers for result files + FILE *fp_det=0, *fp_ori=0; + + // eval image 2D bounding boxes + for (int c = 0; c < NUM_CLASS; c++) { + CLASSES cls = (CLASSES)c; + //mail->msg("Checking 2D evaluation (%s) ...", CLASS_NAMES[c].c_str()); + if (eval_image[c]) { +// mail->msg("Starting 2D evaluation (%s) ...", CLASS_NAMES[c].c_str()); + fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection.txt").c_str(), "w"); + if(compute_aos) + fp_ori = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_orientation.txt").c_str(),"w"); + vector precision[3], aos[3]; + if( !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE) + || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE) + || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) { + mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str()); + return false; + } + fclose(fp_det); + saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection", CLASS_NAMES[c], precision, 0); + if(compute_aos){ + saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_orientation", CLASS_NAMES[c], aos, 1); + fclose(fp_ori); + } +// mail->msg(" done."); + } + } + + // don't evaluate AOS for birdview boxes and 3D boxes + compute_aos = false; + + // eval bird's eye view bounding boxes + for (int c = 0; c < NUM_CLASS; c++) { + CLASSES cls = (CLASSES)c; + //mail->msg("Checking bird's eye evaluation (%s) ...", CLASS_NAMES[c].c_str()); + if (eval_ground[c]) { +// mail->msg("Starting bird's eye evaluation (%s) ...", CLASS_NAMES[c].c_str()); + fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_ground.txt").c_str(), "w"); + vector precision[3], aos[3]; + if( !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND) + || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND) + || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) { + mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str()); + return false; + } + fclose(fp_det); + saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_ground", CLASS_NAMES[c], precision, 0); +// mail->msg(" done."); + } + } + + // eval 3D bounding boxes + for (int c = 0; c < NUM_CLASS; c++) { + CLASSES cls = (CLASSES)c; + //mail->msg("Checking 3D evaluation (%s) ...", CLASS_NAMES[c].c_str()); + if (eval_3d[c]) { +// mail->msg("Starting 3D evaluation (%s) ...", CLASS_NAMES[c].c_str()); + fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_3d.txt").c_str(), "w"); + vector precision[3], aos[3]; + if( !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D) + || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D) + || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) { + mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str()); + return false; + } + fclose(fp_det); + saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_3d", CLASS_NAMES[c], precision, 0); +// mail->msg(" done."); + } + } + + // success + return true; +} + +int32_t main (int32_t argc,char *argv[]) { + + // we need 2 or 4 arguments! + if (argc!=3) { + cout << "Usage: ./eval_detection_3d_offline gt_dir result_dir" << endl; + return 1; + } + + // read arguments + string gt_dir = argv[1]; + string result_dir = argv[2]; + + // init notification mail + Mail *mail; + mail = new Mail(); + mail->msg("Thank you for participating in our evaluation!"); + + // run evaluation + if (eval(gt_dir, result_dir, mail)) { + mail->msg("Your evaluation results are available at:"); + mail->msg(result_dir.c_str()); + } else { + system(("rm -r " + result_dir + "/plot").c_str()); + mail->msg("An error occured while processing your results."); + } + + // send mail and exit + delete mail; + + return 0; +} \ No newline at end of file diff --git a/PaddleCV/3d_vision/SMOKE/tools/kitti_eval_offline/mail.h b/PaddleCV/3d_vision/SMOKE/tools/kitti_eval_offline/mail.h new file mode 100644 index 0000000000000000000000000000000000000000..20fa986b2f705621667d303036671c411799549a --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/tools/kitti_eval_offline/mail.h @@ -0,0 +1,48 @@ +#ifndef MAIL_H +#define MAIL_H + +#include +#include +#include + +class Mail { + +public: + + Mail (std::string email = "") { + if (email.compare("")) { + mail = popen("/usr/lib/sendmail -t -f noreply@cvlibs.net","w"); + fprintf(mail,"To: %s\n", email.c_str()); + fprintf(mail,"From: noreply@cvlibs.net\n"); + fprintf(mail,"Subject: KITTI Evaluation Benchmark\n"); + fprintf(mail,"\n\n"); + } else { + mail = 0; + } + } + + ~Mail() { + if (mail) { + pclose(mail); + } + } + + void msg (const char *format, ...) { + va_list args; + va_start(args,format); + if (mail) { + vfprintf(mail,format,args); + fprintf(mail,"\n"); + } + vprintf(format,args); + printf("\n"); + va_end(args); + } + +private: + + FILE *mail; + +}; + +#endif diff --git a/PaddleCV/3d_vision/SMOKE/train.py b/PaddleCV/3d_vision/SMOKE/train.py new file mode 100644 index 0000000000000000000000000000000000000000..cc3447f0002e39672581d0a199b06e03727d2bcc --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/train.py @@ -0,0 +1,134 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" Copy-paste from PaddleSeg with minor modifications. + https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/train.py +""" + +import argparse + +import paddle + +from smoke.cvlibs import manager, Config +from smoke.utils import logger +from smoke.core import train + + +def parse_args(): + parser = argparse.ArgumentParser(description='Model training') + # params of training + parser.add_argument( + "--config", dest="cfg", help="The config file.", required=True, type=str) + parser.add_argument( + '--iters', + dest='iters', + help='iters for training', + type=int, + default=None) + parser.add_argument( + '--batch_size', + dest='batch_size', + help='Mini batch size of one gpu or cpu', + type=int, + default=None) + parser.add_argument( + '--learning_rate', + dest='learning_rate', + help='Learning rate', + type=float, + default=None) + parser.add_argument( + '--save_interval', + dest='save_interval', + help='How many iters to save a model snapshot once during training.', + type=int, + default=1000) + parser.add_argument( + '--resume_model', + dest='resume_model', + help='The path of resume model', + type=str, + default=None) + parser.add_argument( + '--save_dir', + dest='save_dir', + help='The directory for saving the model snapshot', + type=str, + default='./output') + parser.add_argument( + '--keep_checkpoint_max', + dest='keep_checkpoint_max', + help='Maximum number of checkpoints to save', + type=int, + default=5) + parser.add_argument( + '--num_workers', + dest='num_workers', + help='Num workers for data loader', + type=int, + default=0) + parser.add_argument( + '--log_iters', + dest='log_iters', + help='Display logging information at every log_iters', + default=10, + type=int) + + return parser.parse_args() + + +def main(args): + + paddle.set_device("gpu") + + cfg = Config( + args.cfg, + learning_rate=args.learning_rate, + iters=args.iters, + batch_size=args.batch_size) + + train_dataset = cfg.train_dataset + if train_dataset is None: + raise RuntimeError( + 'The training dataset is not specified in the configuration file.') + elif len(train_dataset) == 0: + raise ValueError( + 'The length of train_dataset is 0. Please check if your dataset is valid' + ) + val_dataset = None #cfg.val_dataset if args.do_eval else None + losses = cfg.loss + + msg = '\n---------------Config Information---------------\n' + msg += str(cfg) + msg += '------------------------------------------------' + logger.info(msg) + + train( + cfg.model, + train_dataset, + val_dataset=val_dataset, + optimizer=cfg.optimizer, + loss_computation=cfg.loss, + save_dir=args.save_dir, + iters=cfg.iters, + batch_size=cfg.batch_size, + resume_model=args.resume_model, + save_interval=args.save_interval, + log_iters=args.log_iters, + num_workers=args.num_workers, + keep_checkpoint_max=args.keep_checkpoint_max) + +if __name__ == '__main__': + args = parse_args() + main(args) diff --git a/PaddleCV/3d_vision/SMOKE/val.py b/PaddleCV/3d_vision/SMOKE/val.py new file mode 100644 index 0000000000000000000000000000000000000000..c48d22629e3ee615613927a37b2c1a9c533e368b --- /dev/null +++ b/PaddleCV/3d_vision/SMOKE/val.py @@ -0,0 +1,94 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +Copy-paste from PaddleSeg with minor modifications. +https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.1/val.py +""" + +import argparse +import os + +import paddle + +from smoke.cvlibs import manager, Config +from smoke.core import evaluate +from smoke.utils import logger, load_pretrained_model + + +def parse_args(): + parser = argparse.ArgumentParser(description='Model evaluation') + + # params of evaluate + parser.add_argument( + "--config", dest="cfg", help="The config file.", default=None, required=True, type=str) + parser.add_argument( + '--model_path', + dest='model_path', + help='The path of model for evaluation', + type=str, + default=None) + parser.add_argument( + '--num_workers', + dest='num_workers', + help='Num workers for data loader', + type=int, + default=0) + parser.add_argument( + '--output_dir', + dest='output_dir', + help='The directory for saving the evaluation results', + type=str, + default='./output') + + + return parser.parse_args() + + +def main(args): + + paddle.set_device("gpu") + + cfg = Config(args.cfg) + val_dataset = cfg.val_dataset + if val_dataset is None: + raise RuntimeError( + 'The verification dataset is not specified in the configuration file.' + ) + elif len(val_dataset) == 0: + raise ValueError( + 'The length of val_dataset is 0. Please check if your dataset is valid' + ) + + msg = '\n---------------Config Information---------------\n' + msg += str(cfg) + msg += '------------------------------------------------' + logger.info(msg) + + model = cfg.model + if args.model_path: + load_pretrained_model(model, args.model_path) + logger.info('Loaded trained params of model successfully') + + evaluate( + model, + val_dataset, + num_workers=args.num_workers, + output_dir=args.output_dir + ) + + +if __name__ == '__main__': + args = parse_args() + main(args)