模型在CPU和GPU上训练效果相差较大
Created by: Colorfu1
learning_rate = 0.001 使用一个cpu和gpu设备。 在cpu上训练时,前20个epoch的训练结果以及loss下降如下: Batch 0, loss 2.522635, acc 0.088436 Done epoch: 0 Epoch: 1 Batch 0, loss 2.114787, acc 0.211000 Done epoch: 1 Epoch: 2 Batch 0, loss 2.454542, acc 0.079980 Done epoch: 2 Epoch: 3 Batch 0, loss 1.930440, acc 0.284088 Done epoch: 3 Epoch: 4 Batch 0, loss 1.789533, acc 0.447309 Done epoch: 4 Epoch: 5 Batch 0, loss 1.745031, acc 0.273679 Done epoch: 5 Epoch: 6 Batch 0, loss 1.684448, acc 0.412491 Done epoch: 6 Epoch: 7 Batch 0, loss 1.654690, acc 0.362373 Done epoch: 7 Epoch: 8 Batch 0, loss 1.588757, acc 0.495754 Done epoch: 8 Epoch: 9 Batch 0, loss 1.572071, acc 0.514561 Done epoch: 9 Epoch: 10 Batch 0, loss 1.540798, acc 0.545184 Done epoch: 10 Epoch: 11 Batch 0, loss 1.591666, acc 0.621386 Done epoch: 11 Epoch: 12 Batch 0, loss 1.504963, acc 0.645011 Done epoch: 12 Epoch: 13 Batch 0, loss 1.534026, acc 0.642602 Done epoch: 13 Epoch: 14 Batch 0, loss 1.476046, acc 0.655928 Done epoch: 14 Epoch: 15 Batch 0, loss 1.491074, acc 0.670692 Done epoch: 15 Epoch: 16 Batch 0, loss 1.444524, acc 0.703621 Done epoch: 16 Epoch: 17 Batch 0, loss 1.514576, acc 0.728161 Done epoch: 17 Epoch: 18 Batch 0, loss 1.492055, acc 0.758072 Done epoch: 18 Epoch: 19 Batch 0, loss 1.408592, acc 0.801796 Done epoch: 19 Epoch: 20 Batch 0, loss 1.415996, acc 0.744361 Done epoch: 20 ################################################ 在gpu上训练,前20个epoch的训练结果如下: Epoch: 0 Batch 0, loss 2.440385, acc 0.112815 Done epoch: 0 Epoch: 1 Batch 0, loss 2.238379, acc 0.001942 Done epoch: 1 Epoch: 2 Batch 0, loss 2.268656, acc 0.002467 Done epoch: 2 Epoch: 3 Batch 0, loss 2.236463, acc 0.000704 Done epoch: 3 Epoch: 4 Batch 0, loss 2.211993, acc 0.000000 Done epoch: 4 Epoch: 5 Batch 0, loss 2.192462, acc 0.001938 Done epoch: 5 Epoch: 6 Batch 0, loss 2.183870, acc 0.001938 Done epoch: 6 Epoch: 7 Batch 0, loss 2.173697, acc 0.000000 Done epoch: 7 Epoch: 8 Batch 0, loss 2.173233, acc 0.000000 Done epoch: 8 Epoch: 9 Batch 0, loss 2.154153, acc 0.000008 Done epoch: 9 Epoch: 10 Batch 0, loss 2.143353, acc 0.000008 Done epoch: 10 Epoch: 10 Batch 0, loss 2.143353, acc 0.000008 Done epoch: 10 Epoch: 11 Batch 0, loss 2.136832, acc 0.390633 Done epoch: 11 Epoch: 12 Batch 0, loss 2.122867, acc 0.446275 Done epoch: 12 Epoch: 13 Batch 0, loss 2.113915, acc 0.441837 Done epoch: 13 Epoch: 14 Batch 0, loss 2.116818, acc 0.341406 Done epoch: 14 Epoch: 15 Batch 0, loss 2.091358, acc 0.479894 Done epoch: 15 Epoch: 16 Batch 0, loss 2.094740, acc 0.422867 Done epoch: 16 Epoch: 17 Batch 0, loss 2.086677, acc 0.438015 Done epoch: 17 Epoch: 18 Batch 0, loss 2.080353, acc 0.438019 Done epoch: 18 Epoch: 19 Batch 0, loss 2.083027, acc 0.411423 Done epoch: 19 Epoch: 20 Batch 0, loss 2.068788, acc 0.488256 Done epoch: 20 ############################################### 代码配置如下: if use_gpu: places = fluid.cuda_places() exe = fluid.Executor(place=fluid.CUDAPlace(0)) else: cpu_num = 1 places = fluid.cpu_places(cpu_num) os.environ['CPU_NUM'] = str(cpu_num) exe = fluid.Executor(place=fluid.CPUPlace()) 在启动GPU时候设置了环境变量CUDA_VISIBLE_DEVICES=0。 观察了gpu的训练情况,epoch=400多时候,准确率依然停留在40-50%之间。请问可能是什么原因导致的。