paddleslim 量化时 loss 变成nan
Created by: imistyrain
W0625 16:37:04.272953 1401 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.0, Runtime API Version: 9.0
W0625 16:37:04.277964 1401 device_context.cc:269] device: 0, cuDNN Version: 7.0.
2019-06-25 16:37:04,700-WARNING: Checkpints path doesn't exist: [./checkpoints_quan/]
2019-06-25 16:37:04,701-INFO: Running evaluation
2019-06-25 16:37:07,379-INFO: batch-0; ['acc_top1', 'acc_top5']=[0.75, 0.96875]
2019-06-25 16:37:12,547-INFO: batch-20; ['acc_top1', 'acc_top5']=[0.625, 0.90625]
2019-06-25 16:37:16,756-INFO: batch-40; ['acc_top1', 'acc_top5']=[0.6875, 0.890625]
2019-06-25 16:37:21,288-INFO: batch-60; ['acc_top1', 'acc_top5']=[0.78125, 0.9375]
...
2019-06-25 16:40:20,955-INFO: Final eval result: ['acc_top1', 'acc_top5']=[0.70931906 0.8955403 ]
2019-06-25 16:40:20,955-INFO: Finish evaluation
2019-06-25 16:40:20,959-INFO: QuantizationStrategy::on_epoch_begin
W0625 16:40:23.255398 1401 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
2019-06-25 16:40:23,277-INFO: Finish QuantizationStrategy::on_epoch_begin
I0625 16:40:25.002485 1401 build_strategy.cc:285] SeqOnlyAllReduceOps:0, num_trainers:1
2019-06-25 16:40:25,245-INFO: epoch:0; batch_id:0; ['loss'] = [1.75]
2019-06-25 16:40:31,022-INFO: epoch:0; batch_id:20; ['loss'] = [7.417]
2019-06-25 16:40:36,357-INFO: epoch:0; batch_id:40; ['loss'] = [7.6]
2019-06-25 16:40:41,937-INFO: epoch:0; batch_id:60; ['loss'] = [6.469]
2019-06-25 16:40:47,443-INFO: epoch:0; batch_id:80; ['loss'] = [6.333]
2019-06-25 16:40:53,054-INFO: epoch:0; batch_id:100; ['loss'] = [5.891]
2019-06-25 16:40:58,743-INFO: epoch:0; batch_id:120; ['loss'] = [6.151]
2019-06-25 16:41:04,335-INFO: epoch:0; batch_id:140; ['loss'] = [5.673]
2019-06-25 16:41:10,044-INFO: epoch:0; batch_id:160; ['loss'] = [5.924]
2019-06-25 16:41:15,735-INFO: epoch:0; batch_id:180; ['loss'] = [5.77]
2019-06-25 16:41:21,532-INFO: epoch:0; batch_id:200; ['loss'] = [4.94]
2019-06-25 16:41:27,042-INFO: epoch:0; batch_id:220; ['loss'] = [5.309]
2019-06-25 16:41:32,918-INFO: epoch:0; batch_id:240; ['loss'] = [5.27]
2019-06-25 16:41:38,628-INFO: epoch:0; batch_id:260; ['loss'] = [5.175]
2019-06-25 16:41:44,336-INFO: epoch:0; batch_id:280; ['loss'] = [4.984]
2019-06-25 16:41:49,705-INFO: epoch:0; batch_id:300; ['loss'] = [5.029]
2019-06-25 16:41:55,633-INFO: epoch:0; batch_id:320; ['loss'] = [5.136]
2019-06-25 16:42:01,450-INFO: epoch:0; batch_id:340; ['loss'] = [5.489]
2019-06-25 16:42:07,112-INFO: epoch:0; batch_id:360; ['loss'] = [5.176]
2019-06-25 16:42:12,710-INFO: epoch:0; batch_id:380; ['loss'] = [4.712]
2019-06-25 16:42:18,569-INFO: epoch:0; batch_id:400; ['loss'] = [4.628]
2019-06-25 16:42:24,568-INFO: epoch:0; batch_id:420; ['loss'] = [5.203]
2019-06-25 16:42:30,541-INFO: epoch:0; batch_id:440; ['loss'] = [4.71]
2019-06-25 16:42:36,521-INFO: epoch:0; batch_id:460; ['loss'] = [4.419]
2019-06-25 16:42:42,294-INFO: epoch:0; batch_id:480; ['loss'] = [4.755]
2019-06-25 16:42:48,355-INFO: epoch:0; batch_id:500; ['loss'] = [7.196]
2019-06-25 16:42:54,225-INFO: epoch:0; batch_id:520; ['loss'] = [nan]
2019-06-25 16:43:00,511-INFO: epoch:0; batch_id:540; ['loss'] = [nan]
2019-06-25 16:43:06,494-INFO: epoch:0; batch_id:560; ['loss'] = [nan]
2019-06-25 16:43:12,470-INFO: epoch:0; batch_id:580; ['loss'] = [nan]
2019-06-25 16:43:18,527-INFO: epoch:0; batch_id:600; ['loss'] = [nan]
2019-06-25 16:43:24,372-INFO: epoch:0; batch_id:620; ['loss'] = [nan]
2019-06-25 16:43:30,409-INFO: epoch:0; batch_id:640; ['loss'] = [nan]
2019-06-25 16:43:36,356-INFO: epoch:0; batch_id:660; ['loss'] = [nan]
2019-06-25 16:43:42,149-INFO: epoch:0; batch_id:680; ['loss'] = [nan]
2019-06-25 16:43:48,184-INFO: epoch:0; batch_id:700; ['loss'] = [nan]
2019-06-25 16:43:54,426-INFO: epoch:0; batch_id:720; ['loss'] = [nan]
2019-06-25 16:44:00,288-INFO: epoch:0; batch_id:740; ['loss'] = [nan]
2019-06-25 16:44:06,483-INFO: epoch:0; batch_id:760; ['loss'] = [nan]
2019-06-25 16:44:12,533-INFO: epoch:0; batch_id:780; ['loss'] = [nan]
2019-06-25 16:44:18,773-INFO: epoch:0; batch_id:800; ['loss'] = [nan]
训练使用的脚本为:
python compress.py \
--batch_size 64 \
--model "MobileNet" \
--pretrained_model ./pretrain/MobileNetV1_pretrained \
--compress_config ./configs/quantization.yaml