- 18 9月, 2010 2 次提交
-
-
git://neil.brown.name/md由 Linus Torvalds 提交于
* 'for-linus' of git://neil.brown.name/md: md: fix v1.x metadata update when a disk is missing. md: call md_update_sb even for 'external' metadata arrays.
-
由 Al Viro 提交于
If a signal hits us outside of a syscall and another gets delivered when we are in sigreturn (e.g. because it had been in sa_mask for the first one and got sent to us while we'd been in the first handler), we have a chance of returning from the second handler to location one insn prior to where we ought to return. If r0 happens to contain -513 (-ERESTARTNOINTR), sigreturn will get confused into doing restart syscall song and dance. Incredible joy to debug, since it manifests as random, infrequent and very hard to reproduce double execution of instructions in userland code... The fix is simple - mark it "don't bother with restarts" in wrapper, i.e. set r8 to 0 in sys_sigreturn and sys_rt_sigreturn wrappers, suppressing the syscall restart handling on return from these guys. They can't legitimately return a restart-worthy error anyway. Testcase: #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <sys/time.h> #include <errno.h> void f(int n) { __asm__ __volatile__( "ldr r0, [%0]\n" "b 1f\n" "b 2f\n" "1:b .\n" "2:\n" : : "r"(&n)); } void handler1(int sig) { } void handler2(int sig) { raise(1); } void handler3(int sig) { exit(0); } main() { struct sigaction s = {.sa_handler = handler2}; struct itimerval t1 = { .it_value = {1} }; struct itimerval t2 = { .it_value = {2} }; signal(1, handler1); sigemptyset(&s.sa_mask); sigaddset(&s.sa_mask, 1); sigaction(SIGALRM, &s, NULL); signal(SIGVTALRM, handler3); setitimer(ITIMER_REAL, &t1, NULL); setitimer(ITIMER_VIRTUAL, &t2, NULL); f(-513); /* -ERESTARTNOINTR */ write(1, "buggered\n", 9); return 1; } Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 9月, 2010 13 次提交
-
-
由 NeilBrown 提交于
If an array with 1.x metadata is assembled with the last disk missing, md doesn't properly record the fact that the disk was missing. This is unlikely to cause a real problem as the event count will be different to the count on the missing disk so it won't be included in the array. However it could still cause confusion. So make sure we clear all the relevant slots, not just the early ones. Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
Now that we depend on md_update_sb to clear variable bits in mddev->flags (rather than trying not to set them) it is important to always call md_update_sb when appropriate. md_check_recovery has this job but explicitly avoids it for ->external metadata arrays. This is not longer appropraite, or needed. However we do want to avoid taking the mddev lock if only MD_CHANGE_PENDING is set as that is not cleared by md_update_sb for external-metadata arrays. Reported-by: N"Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Linus Torvalds 提交于
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hpet: Work around hardware stupidity x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y x86, cpufeature: Suppress compiler warning with gcc 3.x x86, UV: Fix initialization of max_pnode
-
git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6由 Linus Torvalds 提交于
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix potential double put of TCP session reference
-
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6由 Linus Torvalds 提交于
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Optimize ticket spinlocks in fsys_rt_sigprocmask
-
git://github.com/schandinat/linux-2.6由 Linus Torvalds 提交于
* '2.6.36-fixes' of git://github.com/schandinat/linux-2.6: drivers/video/via/ioctl.c: prevent reading uninitialized stack memory
-
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6由 Linus Torvalds 提交于
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6: pcmcia pcnet_cs: try setting io_lines to 16 if card setup fails pcmcia: per-device, not per-socket debug messages pcmcia serial_cs.c: fix multifunction card handling
-
git://git.infradead.org/users/cbou/battery-2.6.36由 Linus Torvalds 提交于
* git://git.infradead.org/users/cbou/battery-2.6.36: apm_power: Add missing break statement intel_pmic_battery: Fix battery charging status on mrst
-
git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog由 Linus Torvalds 提交于
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog: watchdog: Enable NXP LPC32XX support in Kconfig (resend) watchdog: ts72xx_wdt: disable watchdog at probe watchdog: sb_wdog: release irq and reboot notifier in error path and module_exit()
-
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile由 Linus Torvalds 提交于
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: fix formatting bug in register dumps arch/tile: fix memcpy_fromio()/memcpy_toio() signatures arch/tile: Save and restore extra user state for tilegx arch/tile: Change struct sigcontext to be more useful arch/tile: finish const-ifying sys_execve()
-
git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6由 Linus Torvalds 提交于
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: regulator: wm8350-regulator - fix the logic of checking REGULATOR_MODE_STANDBY mode regulator: wm831x-ldo - fix the logic to set REGULATOR_MODE_IDLE and REGULATOR_MODE_STANDBY modes regulator: ab8500 - fix off-by-one value range checking for selector regulator: 88pm8607 - fix value range checking for accessing info->vol_table regulator: isl6271a-regulator - fix regulator_desc parameter for regulator_register() regulator: ad5398 - fix a memory leak regulator: Update e-mail address for Liam Girdwood regulator: set max8998->dev to &pdev->dev. regulator: tps6586x-regulator - fix bit_mask parameter for tps6586x_set_bits() regulator: tps6586x-regulator - fix value range checking for val regulator: max8998 - set max8998->num_regulators regulator: max8998 - fix memory allocation size for max8998->rdev regulator: tps6507x - remove incorrect comments regulator: max1586 - improve the logic of choosing selector regulator: ab8500 - fix the logic to remove already registered regulators in error path regulator: ab3100 - fix the logic to remove already registered regulators in error path regulator/ab8500: move dereference below the check for NULL
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq由 Linus Torvalds 提交于
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add documentation
-
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6由 Linus Torvalds 提交于
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/radeon/kms: only warn on mipmap size checks in r600 cs checker (v2) drm/radeon/kms: force legacy pll algo for RV620 LVDS drm: fix race between driver loading and userspace open. drm: Use a nondestructive mode for output detect when polling (v2) drm/radeon/kms: fix the colorbuffer CS checker for r300-r500 drm/radeon/kms: increase lockup detection interval to 10 sec for r100-r500 drm/radeon/kms/evergreen: fix backend setup drm: Use a nondestructive mode for output detect when polling drm/radeon: add some missing copyright headers drm: Only decouple the old_fb from the crtc is we call mode_set* drm/radeon/kms: don't enable underscan with interlaced modes drm/radeon/kms: add connector table for Mac x800 drm/radeon/kms: fix regression in RMX code (v2) drm: Fix regression in disable polling e58f637b
-
- 16 9月, 2010 5 次提交
-
-
由 Dan Rosenberg 提交于
The VIAFB_GET_INFO device ioctl allows unprivileged users to read 246 bytes of uninitialized stack memory, because the "reserved" member of the viafb_ioctl_info struct declared on the stack is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: NFlorian Tobias Schandinat <FlorianSchandinat@gmx.de>
-
由 Petr Tesarik 提交于
Tony's fix (f574c843) has a small bug, it incorrectly uses "r3" as a scratch register in the first of the two unlock paths ... it is also inefficient. Optimize the fast path again. Signed-off-by: NPetr Tesarik <ptesarik@suse.cz> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Kevin Wells 提交于
The NXP LPC32XX processor use the same watchdog as the Philips PNX4008 processor. Signed-off-by: NKevin Wells <wellsk40@gmail.com> Tested-by: NWolfram Sang <w.sang@pengutronix.de> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Mika Westerberg 提交于
Since it may be already enabled by bootloader or some other utility. This patch makes sure that the watchdog is disabled before any userspace daemon opens the device. It is also required by the watchdog API. Signed-off-by: NMika Westerberg <mika.westerberg@iki.fi> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Akinobu Mita 提交于
irq and reboot notifier are acquired in module_init() but never released. They should be released correctly, otherwise reloading the module or error during module_init() will cause a problem. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Andrew Sharp <andy.sharp@lsi.com> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
- 15 9月, 2010 20 次提交
-
-
由 Dominik Brodowski 提交于
Some pcnet_cs compatible cards require an exact 16-lines match of the ioport areas specified in CIS, but set the "iolines" value in the CIS incorrectly. We can easily work around this issue -- same as we do in serial_cs -- by first trying setting iolines to the CIS-specified value, and then trying a 16-line match. Reported-and-tested-by: NWolfram Sang <w.sang@pengutronix.de> Hardware-supplied-by: NJochen Frieling <j.frieling@pengutronix.de> CC: netdev@vger.kernel.org Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
As the iomem / ioport setup differs per device, it is much better to print out the device instead of the socket. Tested-by: NWolfram Sang <w.sang@pengutronix.de> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
We shouldn't overwrite pre-set values, and we should also set the port address to the beginning, and not the end of the 8-port range. CC: linux-serial@vger.kernel.org Reported-by: NKomuro <komurojun-mbn@nifty.com> Hardware-supplied-by: NJochen Frieling <j.frieling@pengutronix.de> Tested-by: NWolfram Sang <w.sang@pengutronix.de> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Chris Metcalf 提交于
This cut-and-paste bug was caused by rewriting the register dump code to use only a single printk per line of output. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
-
由 Chris Metcalf 提交于
This tripped up a driver (not yet committed to git). Fix it now. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
-
由 Chris Metcalf 提交于
During context switch, save and restore a couple of additional bits of tilegx user state that can be persistently modified by userspace. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
-
由 Chris Metcalf 提交于
Rather than just using pt_regs, it now contains the actual saved state explicitly, similar to pt_regs. By doing it this way, we provide a cleaner API for userspace (or equivalently, we avoid the need for libc to provide its own definition of sigcontext). While we're at it, move PT_FLAGS_xxx to where they are not visible from userspace. And always pass siginfo and mcontext to signal handlers, even if they claim they don't need it, since sometimes they actually try to use it anyway in practice. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
-
由 Chris Metcalf 提交于
The sys_execve() implementation was properly const-ified but not the declaration, the syscall wrappers, or the compat version. This change completes the constification process. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
-
由 Alex Deucher 提交于
The texture base address registers are in units of 256 bytes. The original CS checker treated these offsets as bytes, so the original check was wrong. I fixed the units in a patch during the 2.6.36 cycle, but this ended up breaking some existing userspace (probably due to a bug in either userspace texture allocation or the drm texture mipmap checker). So for now, until we come up with a better fix, just warn if the mipmap size it too large. This will keep existing userspace working and it should be just as safe as before when we were checking the wrong units. These are GPU MC addresses, so if they fall outside of the VRAM or GART apertures, they end up at the GPU default page, so this should be safe from a security perspective. v2: Just disable the warning. It just spams the log and there's nothing the user can do about it. Signed-off-by: NAlex Deucher <alexdeucher@gmail.com> Cc: Jerome Glisse <glisse@freedesktop.org> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
ssh://master.kernel.org/home/hpa/tree/sec由 Linus Torvalds 提交于
* ssh://master.kernel.org/home/hpa/tree/sec: x86-64, compat: Retruncate rax after ia32 syscall entry tracing x86-64, compat: Test %rax for the syscall number, not %eax compat: Make compat_alloc_user_space() incorporate the access_ok()
-
由 David Howells 提交于
Fix up the IRQ names for the MN10300 on-chip serial ports in the driver as request_interrupt() no longer allows names containing slashes, giving a warning like the following if one is encountered: ------------[ cut here ]------------ WARNING: at fs/proc/generic.c:323 __xlate_proc_name+0x62/0x7c() name 'ttySM0/Rx' Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.infradead.org/mtd-2.6由 Linus Torvalds 提交于
* git://git.infradead.org/mtd-2.6: mtd: pxa3xx: fix build error when CONFIG_MTD_PARTITIONS is not defined mtd: mxc_nand: configure pages per block for v2 controller mtd: OneNAND: Fix loop hang when DMA error at Samsung SoCs mtd: OneNAND: Fix 2KiB pagesize handling at Samsung SoCs mtd: Blackfin NFC: fix invalid free in remove() mtd: Blackfin NFC: fix build error after nand_scan_ident() change mxc_nand: Do not do byte accesses to the NFC buffer.
-
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid由 Linus Torvalds 提交于
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: fix hiddev's use of usb_find_interface HID: fixup blacklist entry for Asus T91MT HID: add device ID for new Asus Multitouch Controller HID: add no-get quirk for eGalax touch controller HID: Add quirk for eGalax touch controler. HID: add support for another BTC Emprex remote control HID: Set Report ID properly for Output reports on the Control endpoint. HID: Kanvus Note A5 tablet needs HID_QUIRK_MULTI_INPUT HID: Add support for chicony multitouch screens.
-
git://git.linux-nfs.org/projects/trondmy/nfs-2.6由 Linus Torvalds 提交于
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies statfs() gives ESTALE error NFS: Fix a typo in nfs_sockaddr_match_ipaddr6 sunrpc: increase MAX_HASHTABLE_BITS to 14 gss:spkm3 miss returning error to caller when import security context gss:krb5 miss returning error to caller when import security context Remove incorrect do_vfs_lock message SUNRPC: cleanup state-machine ordering SUNRPC: Fix a race in rpc_info_open SUNRPC: Fix race corrupting rpc upcall Fix null dereference in call_allocate
-
由 Jeff Moyer 提交于
Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array: if (unlikely(nr < 0)) return -EINVAL; if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp))))) return -EFAULT; ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long. This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: NTavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Layton 提交于
cifs_get_smb_ses must be called on a server pointer on which it holds an active reference. It first does a search for an existing SMB session. If it finds one, it'll put the server reference and then try to ensure that the negprot is done, etc. If it encounters an error at that point then it'll return an error. There's a potential problem here though. When cifs_get_smb_ses returns an error, the caller will also put the TCP server reference leading to a double-put. Fix this by having cifs_get_smb_ses only put the server reference if it found an existing session that it could use and isn't returning an error. Cc: stable@kernel.org Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Roland McGrath 提交于
In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a 32-bit tracee in system call entry. A %rax value set via ptrace at the entry tracing stop gets used whole as a 32-bit syscall number, while we only check the low 32 bits for validity. Fix it by truncating %rax back to 32 bits after syscall_trace_enter, in addition to testing the full 64 bits as has already been added. Reported-by: NBen Hawkes <hawkes@sota.gen.nz> Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 H. Peter Anvin 提交于
On 64 bits, we always, by necessity, jump through the system call table via %rax. For 32-bit system calls, in theory the system call number is stored in %eax, and the code was testing %eax for a valid system call number. At one point we loaded the stored value back from the stack to enforce zero-extension, but that was removed in checkin d4d67150. An actual 32-bit process will not be able to introduce a non-zero-extended number, but it can happen via ptrace. Instead of re-introducing the zero-extension, test what we are actually going to use, i.e. %rax. This only adds a handful of REX prefixes to the code. Reported-by: NBen Hawkes <hawkes@sota.gen.nz> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>
-
由 H. Peter Anvin 提交于
compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by: NBen Hawkes <hawkes@sota.gen.nz> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NChris Metcalf <cmetcalf@tilera.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NTony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@kernel.org>
-
由 Thomas Gleixner 提交于
This more or less reverts commits 08be9796 (x86: Force HPET readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict read back to affected ATI chipsets) to the status of commit 8da854cb (x86, hpet: Erratum workaround for read after write of HPET comparator). The delta to commit 8da854cb is mostly comments and the change from WARN_ONCE to printk_once as we know the call path of this function already. This needs really in depth explanation: First of all the HPET design is a complete failure. Having a counter compare register which generates an interrupt on matching values forces the software to do at least one superfluous readback of the counter register. While it is nice in theory to program "absolute" time events it is practically useless because the timer runs at some absurd frequency which can never be matched to real world units. So we are forced to calculate a relative delta and this forces a readout of the actual counter value, adding the delta and programming the compare register. When the delta is small enough we run into the danger that we program a compare value which is already in the past. Due to the compare for equal nature of HPET we need to read back the counter value after writing the compare rehgister (btw. this is necessary for absolute timeouts as well) to make sure that we did not miss the timer event. We try to work around that by setting the minimum delta to a value which is larger than the theoretical time which elapses between the counter readout and the compare register write, but that's only true in theory. A NMI or SMI which hits between the readout and the write can easily push us beyond that limit. This would result in waiting for the next HPET timer interrupt until the 32bit wraparound of the counter happens which takes about 306 seconds. So we designed the next event function to look like: match = read_cnt() + delta; write_compare_ref(match); return read_cnt() < match ? 0 : -ETIME; At some point we got into trouble with certain ATI chipsets. Even the above "safe" procedure failed. The reason was that the write to the compare register was delayed probably for performance reasons. The theory was that they wanted to avoid the synchronization of the write with the HPET clock, which is understandable. So the write does not hit the compare register directly instead it goes to some intermediate register which is copied to the real compare register in sync with the HPET clock. That opens another window for hitting the dreaded "wait for a wraparound" problem. To work around that "optimization" we added a read back of the compare register which either enforced the update of the just written value or just delayed the readout of the counter enough to avoid the issue. We unfortunately never got any affirmative info from ATI/AMD about this. One thing is sure, that we nuked the performance "optimization" that way completely and I'm pretty sure that the result is worse than before some HW folks came up with those. Just for paranoia reasons I added a check whether the read back compare register value was the same as the value we wrote right before. That paranoia check triggered a couple of years after it was added on an Intel ICH9 chipset. Venki added a workaround (commit 8da854cb) which was reading the compare register twice when the first check failed. We considered this to be a penalty in general and restricted the readback (thus the wasted CPU cycles) to the known to be affected ATI chipsets. This turned out to be a utterly wrong decision. 2.6.35 testers experienced massive problems and finally one of them bisected it down to commit 30a564be which spured some further investigation. Finally we got confirmation that the write to the compare register can be delayed by up to two HPET clock cycles which explains the problems nicely. All we can do about this is to go back to Venki's initial workaround in a slightly modified version. Just for the record I need to say, that all of this could have been avoided if hardware designers and of course the HPET committee would have thought about the consequences for a split second. It's out of my comprehension why designing a working timer is so hard. There are two ways to achieve it: 1) Use a counter wrap around aware compare_reg <= counter_reg implementation instead of the easy compare_reg == counter_reg Downsides: - It needs more silicon. - It needs a readout of the counter to apply a relative timeout. This is necessary as the counter does not run in any useful (and adjustable) frequency and there is no guarantee that the counter which is used for timer events is the same which is used for reading the actual time (and therefor for calculating the delta) Upsides: - None 2) Use a simple down counter for relative timer events Downsides: - Absolute timeouts are not possible, which is not a problem at all in the context of an OS and the expected max. latencies/jitter (also see Downsides of #1) Upsides: - It needs less or equal silicon. - It works ALWAYS - It is way faster than a compare register based solution (One write versus one write plus at least one and up to four reads) I would not be so grumpy about all of this, if I would not have been ignored for many years when pointing out these flaws to various hardware folks. I really hate timers (at least those which seem to be designed by janitors). Though finally we got a reasonable explanation plus a solution and I want to thank all the folks involved in chasing it down and providing valuable input to this. Bisected-by: NNix <nix@esperi.org.uk> Reported-by: NArtur Skawina <art.08.09@gmail.com> Reported-by: NDamien Wyart <damien.wyart@free.fr> Reported-by: NJohn Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-