x86: fix C1E && nx6325 stability problem
The problems are that, with the ACPI vs timer overring issue _fixed_, after using the box for some time (between several seconds and 1 hour, at random) processes get very high CPU loads (once I've got X using 107% of the CPU, for example) and the system becomes unresponsive, as though there were interrupts lost or something similar. Andreas Herrman reproduced similar problems: > Ok, now I've reproduced the stability problem. > - Using tip/master, > - reverting e38502eb8aa82314d5ab0eba45f50e6790dadd88 and > - applying your patch from this posting > http://marc.info/?l=linux-kernel&m=121539354224562&w=4 > > Starting X, firefox, gimp, tuxpaint and doing some drawing in tuxpaint > results in a slow system. Drawing is almost not possible anymore -- > Selections of new colors, cursors etc. is performed with huge delay > if it's performed at all. > > BTW, the code sets up timer IRQ as Virtual Wire IRQ: > > Jul 8 14:57:58 kodscha IO-APIC (apicid-pin) 2-22, 2-23 not connected. > Jul 8 14:57:58 kodscha ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > Jul 8 14:57:58 kodscha ...trying to set up timer as Virtual Wire IRQ... works. > > and both INT0 and INT2 of IOAPIC are masked: > > Jul 8 14:57:58 kodscha NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect: > Jul 8 14:57:58 kodscha 00 000 1 0 0 0 0 0 0 00 > Jul 8 14:57:58 kodscha 01 003 0 0 0 0 0 1 1 31 > Jul 8 14:57:58 kodscha 02 003 1 0 0 0 0 0 0 30 > > I've also seen strange CPU utilization -- with syslog-ng: > > top - 15:33:06 up 35 min, 4 users, load average: 1.70, 0.68, 0.37 > Tasks: 64 total, 4 running, 60 sleeping, 0 stopped, 0 zombie > Cpu0 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu1 : 6.4%us, 87.2%sy, 0.0%ni, 5.8%id, 0.0%wa, 0.6%hi, 0.0%si, 0.0%st > Mem: 895384k total, 283568k used, 611816k free, 35492k buffers > Swap: 1959920k total, 0k used, 1959920k free, 163044k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4632 root 20 0 17216 800 580 S 104 0.1 0:34.22 syslog-ng > 28505 root 20 0 205m 11m 4024 S 6 1.3 0:21.16 X > 28518 root 20 0 56292 5652 4492 S 1 0.6 0:01.80 fluxbox > 1 root 20 0 3724 608 508 S 0 0.1 0:00.36 init > > So far I have no clue why C1E-idle in conjunction with virtual wire > mode causes this strange behaviour. > > ... and I start to think about the root cause of all this. > > I've performed similar tests under X with the IRQ0/INT0 configuration and > I did not see above symptoms. So lets fall back to the IRQ0/INT0 configuration on this box. This basically restores the dont-use-the-lapic-timer exception mechanism that was unconditional on this box prior commit 8750bf5 ("x86: add C1E aware idle function"). Signed-off-by: NIngo Molnar <mingo@elte.hu>
Showing
想要评论请 注册 或 登录