paddlex训练的模型在Jetson nano上用python API部署报错
Created by: jedibobo
硬件环境是Jetson nano JetPack4.3 Cuda10.0 Cudnn 7.6.3 python环境是安装paddlepaddle-gpu==2.0.0alpha0(自己编译的) 并安装了paddlex,由于有找不到opencv的报错(实际上有,通过ln过去的),我按照--no-dependencies的方式一个个包安装。 模型训练使用Aistudio的https://aistudio.baidu.com/aistudio/projectdetail/633900进行训练 我用文档中的方法对模型进行了裁剪训练。 训练代码: num_classes = len(train_dataset.labels) print('class num:', num_classes) model = pdx.det.YOLOv3(num_classes=num_classes, backbone='MobileNetV3_large', anchors=anchor_sizes) model.train( num_epochs=20, train_dataset=train_dataset, train_batch_size=4, eval_dataset=eval_dataset, learning_rate=0.00025, lr_decay_epochs=[10, 15], save_interval_epochs=2, log_interval_steps=100, save_dir='./yolov3_mobilenetv3_prune', pretrain_weights='./yolov3_MobileNetV3_large/best_model', # pretrain_weights='IMAGENET', use_vdl=True, sensitivities_file='./sensitivities.data', eval_metric_loss=0.10)
之后导出为Inference模型:(想用trt固定了输入大小) !paddlex --export_inference --model_dir=./yolov3_mobilenetv3_prune/best_model --save_dir=./yolov3_mobilenetv3_inference_model --fixed_input_shape=[512,512]
实际部署的时候报错如下: Traceback (most recent call last): File "test.py", line 20, in result = model.predict(image_name, eval_transforms) File "/home/dlinano/envs/paddle4-paddlex/lib/python3.6/site-packages/paddlex/deploy.py", line 235, in predict model_pred = self.raw_predict(preprocessed_input) File "/home/dlinano/envs/paddle4-paddlex/lib/python3.6/site-packages/paddlex/deploy.py", line 217, in raw_predict self.predictor.zero_copy_run() RuntimeError: parallel_for failed: too many resources requested for launch
在Paddle的issue里面的不完整解答说是硬件资源不足?我认为比较没有说服力,目前模型导出后大概30M,之前尝试过200多M大小的模型的部署也未出现这种问题。请问有什么比较好的解决方法吗?