paddlex训练的模型在Jetson nano上用python API部署报错
Created by: jedibobo
硬件环境是Jetson nano
JetPack4.3 Cuda10.0 Cudnn 7.6.3
python环境是安装paddlepaddle-gpu==2.0.0alpha0(自己编译的)
并安装了paddlex,由于有找不到opencv的报错(实际上有,通过ln过去的),我按照--no-dependencies的方式一个个包安装。
模型训练使用Aistudio的https://aistudio.baidu.com/aistudio/projectdetail/633900进行训练
我用文档中的方法对模型进行了裁剪训练。
训练代码:
num_classes = len(train_dataset.labels)
print('class num:', num_classes)
model = pdx.det.YOLOv3(num_classes=num_classes, backbone='MobileNetV3_large', anchors=anchor_sizes)
model.train(
num_epochs=20,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
learning_rate=0.00025,
lr_decay_epochs=[10, 15],
save_interval_epochs=2,
log_interval_steps=100,
save_dir='./yolov3_mobilenetv3_prune',
pretrain_weights='./yolov3_MobileNetV3_large/best_model',
# pretrain_weights='IMAGENET',
use_vdl=True,
sensitivities_file='./sensitivities.data',
eval_metric_loss=0.10)
之后导出为Inference模型:(想用trt固定了输入大小) !paddlex --export_inference --model_dir=./yolov3_mobilenetv3_prune/best_model --save_dir=./yolov3_mobilenetv3_inference_model --fixed_input_shape=[512,512]
实际部署的时候报错如下: Traceback (most recent call last): File "test.py", line 20, in result = model.predict(image_name, eval_transforms) File "/home/dlinano/envs/paddle4-paddlex/lib/python3.6/site-packages/paddlex/deploy.py", line 235, in predict model_pred = self.raw_predict(preprocessed_input) File "/home/dlinano/envs/paddle4-paddlex/lib/python3.6/site-packages/paddlex/deploy.py", line 217, in raw_predict self.predictor.zero_copy_run() RuntimeError: parallel_for failed: too many resources requested for launch
在Paddle的issue里面的不完整解答说是硬件资源不足?我认为比较没有说服力,目前模型导出后大概30M,之前尝试过200多M大小的模型的部署也未出现这种问题。请问有什么比较好的解决方法吗?