1. 28 9月, 2010 8 次提交
    • S
      omap4: control: Add ctrl_pad_base to omap_globals · 0c349246
      Santosh Shilimkar 提交于
      On omap4 control module is divided in four IP blocks.
      - CTRL_MODULE_CORE			0x4a002000
      - CTRL_MODULE_PAD_CORE		0x4a100000
      - CTRL_MODULE_WKUP			0x4a30c000
      - CTRL_MODULE_PAD_WKUP		0x4a31e000
      
      Addressing all the modules with single base address is not possible
      considering 16 bit offsets. The mux code manages the pad core and pad
      wakeup related base address inside the mux framework. For other usage
      only control core and control pad bases are necessary. So this patch
      maps only needed pad control base address which is used by device drivers
      and infrastructure code
      
      The main control core base is still kept same in this patch to
      keep git-bisect working. This will be fixed in the relevant patch
      in this series.
      Signed-off-by: NBenoit Cousson <b-cousson@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      0c349246
    • B
      OMAP4: clocks: Fix ES2 clock issues · 0edc9e85
      Benoit Cousson 提交于
      Fix a few OMAP4430 clock tree problems after the recent manual merge of the
      various ES2 clock patches:
      
      - usim optional clock and its parent had the same name, rename the parent
      usim_fclk -> usim_ck
      
      - OPTFCLKEN_CLK32K is not handled anymore by the USBPHYOCP2SCP module in ES2
      Create a new clock that belongs to CM_ALWON_USBPHY_CLKCTRL register
      
      This patch depends on some of the PRCM macro updates from Rajendra.
      Signed-off-by: NBenoit Cousson <b-cousson@ti.com>
      [paul@pwsan.com: tweaked patch description]
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Rajendra Nayak <rnayak@ti.com>
      0edc9e85
    • R
      OMAP4: powerdomain: Update DSS logic state for ES2 · bb722f33
      Rajendra Nayak 提交于
      DSS on ES2 supports only OSWR, hence remove the support
      for CSWR from the powerdomain framework.
      Signed-off-by: NRajendra Nayak <rnayak@ti.com>
      Signed-off-by: NBenoît Cousson <b-cousson@ti.com>
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Kevin Hilman <khilman@deeprootsystems.com>
      bb722f33
    • R
      OMAP4: PM: Define additional registers for ES2 · fdd4f409
      Rajendra Nayak 提交于
      4430 ES2 has a few new registers added and a few modified
      from ES1. This patch adds all the register changes in PRM
      and CM for OMAP4430 ES2.
      Signed-off-by: NRajendra Nayak <rnayak@ti.com>
      Signed-off-by: NBenoît Cousson <b-cousson@ti.com>
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Kevin Hilman <khilman@deeprootsystems.com>
      fdd4f409
    • R
      OMAP4: CM & PRM: Update PRCM register bitshifts and masks for ES2 · 568997cf
      Rajendra Nayak 提交于
      This patch updates the PRM and CM register bitshifts and masks
      for OMAP4430 ES2.0.
      
      Replace as well the BITFIELD macro with the shift operator in order
      to be consistent with the previous OMAP2 & 3 format.
      
      Sort the register list in comments in order to have a consistent
      register order and avoid futur change during code generation.
      Signed-off-by: NRajendra Nayak <rnayak@ti.com>
      Signed-off-by: NBenoît Cousson <b-cousson@ti.com>
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Kevin Hilman <khilman@deeprootsystems.com>
      568997cf
    • B
      OMAP4: clock: Add optional clock nodes · 1c03f42f
      Benoit Cousson 提交于
      OMAP4 IP optional clocks require explicit enable in module CTRLCLK
      register. In order to allow that we have to create artificial clock
      nodes that represent this clock inputs in the IP.
      
      Notes:
      - Temporary use OMAP3 names for GPIO optional clocks until the GPIO hwmod
      convertion is done. It will enforce the usage of OMAP4 names as the reference.
      - Temporary use OMAP3 names for TIMER main clock (gptX_fck) until TIMER hwmod
      convertion is done. During that convertion, the new name will have to be used.
      Signed-off-by: NBenoit Cousson <b-cousson@ti.com>
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Kevin Hilman <khilman@deeprootsystems.com>
      Cc: Rajendra Nayak <rnayak@ti.com>
      1c03f42f
    • B
      OMAP4: clock: Fix clock names and align with hwmod names · 0e433271
      Benoit Cousson 提交于
      The OMAP4 hwmod data introduced the new naming convention for TI
      IPs (See patch OMAP4: hwmod: Add partial hwmod support for OMAP4430 ES1.0)
      
      The leaf clock names are using the same IP name and thus must be
      modified to match the clock populated in the hwmod data.
      
      - Fix some leaf clocks nodes that were using a _iclk instead of the _fclk
      prefix.
      - Fix some wrong interface clock name for master IPs connected to
      interconnect.
      
      Please not that due to the fact that nodes are sorted by name, the name
      change will introduce a quite ugly diff a little bit hard to follow.
      
      Timers clock con_id is still using the old gptX_fck name until the
      gptimer driver is updated to omap_device framework.
      Timers entries in hwmods DB are still disabled until the migration
      if timer to platform_driver + omap_hwmod.
      Signed-off-by: NBenoit Cousson <b-cousson@ti.com>
      [paul@pwsan.com: manually resolved conflicts with Rajendra's clock patch]
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Rajendra Nayak <rnayak@ti.com>
      0e433271
    • R
      OMAP4: clocks: Update clock tree for ES2 · 76cf5295
      Rajendra Nayak 提交于
      This patch updates the clock tree with all the
      changes in OMAP4430 ES2.
      
      clock nodes added
      -1- tie_low_clock_ck
      -2- abe_dpll_bypass_clk_mux_ck
      
      clock nodes deleted
      -1- dpll_sys_ref_clk
      -2- per_sgx_fclk
      -3- usbphyocp2scp_ick
      Signed-off-by: NRajendra Nayak <rnayak@ti.com>
      Signed-off-by: NBenoît Cousson <b-cousson@ti.com>
      [paul@pwsan.com: added comment re ES1 clocks to top of file]
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Kevin Hilman <khilman@deeprootsystems.com>
      76cf5295
  2. 21 9月, 2010 5 次提交
  3. 19 9月, 2010 10 次提交
    • A
      alpha: deal with multiple simultaneously pending signals · 494486a1
      Al Viro 提交于
      Unlike the other targets, alpha sets _one_ sigframe and
      buggers off until the next syscall/interrupt, even if
      more signals are pending.  It leads to quite a few unpleasant
      inconsistencies, starting with SIGSEGV potentially arriving
      not where it should and including e.g. mess with sigsuspend();
      consider two pending signals blocked until sigsuspend()
      unblocks them.  We pick the first one; then, if we are hit
      by interrupt while in the handler, we process the second one
      as well.  If we are not, and if no syscalls had been made,
      we get out of the first handler and leave the second signal
      pending; normally sigreturn() would've picked it anyway, but
      here it starts with restoring the original mask and voila -
      the second signal is blocked again.  On everything else we
      get both delivered consistently.
      
      It's actually easy to fix; the only thing to watch out for
      is prevention of double syscall restart.  Fortunately, the
      idea I've nicked from arm fix by rmk works just fine...
      
      Testcase demonstrating the behaviour in question; on alpha
      we get one or both flags set (usually one), on everything
      else both are always set.
      	#include <signal.h>
      	#include <stdio.h>
      	int had1, had2;
      	void f1(int sig) { had1 = 1; }
      	void f2(int sig) { had2 = 1; }
      	main()
      	{
      		sigset_t set1, set2;
      		sigemptyset(&set1);
      		sigemptyset(&set2);
      		sigaddset(&set2, 1);
      		sigaddset(&set2, 2);
      		signal(1, f1);
      		signal(2, f2);
      		sigprocmask(SIG_SETMASK, &set2, NULL);
      		raise(1);
      		raise(2);
      		sigsuspend(&set1);
      		printf("had1:%d had2:%d\n", had1, had2);
      	}
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      494486a1
    • A
      alpha: fix a 14 years old bug in sigreturn tracing · 53293638
      Al Viro 提交于
      The way sigreturn() is implemented on alpha breaks PTRACE_SYSCALL,
      all way back to 1.3.95 when alpha has grown PTRACE_SYSCALL support.
      
      What happens is direct return to ret_from_syscall, in order to bypass
      mangling of a3 (error indicator) and prevent other mutilations of
      registers (e.g. by syscall restart).  That's fine, but... the entire
      TIF_SYSCALL_TRACE codepath is kept separate on alpha and post-syscall
      stopping/notifying the tracer is after the syscall.  And the normal
      path we are forcibly switching to doesn't have it.
      
      So we end up with *one* stop in traced sigreturn() vs. two in other
      syscalls.  And yes, strace is visibly broken by that; try to strace
      the following
      	#include <signal.h>
      	#include <stdio.h>
      	void f(int sig) {}
      	main()
      	{
      		signal(SIGHUP, f);
      		raise(SIGHUP);
      		write(1, "eeeek\n", 6);
      	}
      and watch the show.  The
      	close(1)                                = 405
      in the end of strace output is coming from return value of write() (6 ==
      __NR_close on alpha) and syscall number of exit_group() (__NR_exit_group ==
      405 there).
      
      The fix is fairly simple - the only thing we end up missing is the call
      of syscall_trace() and we can tell whether we'd been called from the
      SYSCALL_TRACE path by checking ra value.  Since we are setting the
      switch_stack up (that's what sys_sigreturn() does), we have the right
      environment for calling syscall_trace() - just before we call
      undo_switch_stack() and return.  Since undo_switch_stack() will overwrite
      s0 anyway, we can use it to store the result of "has it been called from
      SYSCALL_TRACE path?" check.  The same thing applies in rt_sigreturn().
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      53293638
    • A
      alpha: unb0rk sigsuspend() and rt_sigsuspend() · 392fb6e3
      Al Viro 提交于
      Old code used to set regs->r0 and regs->r19 to force the right
      return value.  Leaving that after switch to ERESTARTNOHAND
      was a Bad Idea(tm), since now that screws the restart - if we
      hit the case when get_signal_to_deliver() returns 0, we will
      step back to syscall insn, with v0 set to EINTR and a3 to 1.
      The latter won't matter, since EINTR is 4, aka __NR_write.
      
      Testcase:
      
      	#include <signal.h>
      	#define _GNU_SOURCE
      	#include <unistd.h>
      	#include <sys/syscall.h>
      
      	main()
      	{
      		sigset_t mask;
      		sigemptyset(&mask);
      		sigaddset(&mask, SIGCONT);
      		sigprocmask(SIG_SETMASK, &mask, NULL);
      		kill(0, SIGCONT);
      		syscall(__NR_sigsuspend, 1, "b0rken\n", 7);
      	}
      
      results on alpha in immediate message to stdout...
      
      Fix is obvious; moreover, since we don't need regs anymore, we can
      switch to normal prototypes for these guys and lose the wrappers.
      Even better, rt_sigsuspend() is identical to generic version in
      kernel/signal.c now.
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      392fb6e3
    • A
      alpha: belated ERESTART_RESTARTBLOCK race fix · 2deba1bd
      Al Viro 提交于
      same thing as had been done on other targets back in 2003 -
      move setting ->restart_block.fn into {rt_,}sigreturn().
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      2deba1bd
    • M
      alpha: Shift perf event pending work earlier in timer interrupt · bdc8b891
      Michael Cree 提交于
      Pending work from the performance event subsystem is executed in
      the timer interrupt.  This patch shifts the call to
      perf_event_do_pending() before the call to update_process_times()
      as the latter may call back into the perf event subsystem and it
      is prudent to have the pending work executed first.
      Signed-off-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      bdc8b891
    • M
      alpha: wire up fanotify and prlimit64 syscalls · 531f0474
      Mikael Pettersson 提交于
      The 2.6.36-rc kernel added three new system calls:
      fanotify_init, fanotify_mark, and prlimit64.  This
      patch wires them up on Alpha.
      
      Built and booted on an XP900.  Untested beyond that.
      Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      531f0474
    • A
      alpha: kill big kernel lock · 12e750d9
      Arnd Bergmann 提交于
      All uses of the BKL on alpha are totally bogus, nothing
      is really protected by this. Remove the remaining users
      so we don't have to mark alpha as 'depends on BKL'.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: linux-alpha@vger.kernel.org
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      12e750d9
    • T
      alpha: fix build breakage in asm/cacheflush.h · b97f897d
      Tejun Heo 提交于
      Alpha SMP flush_icache_user_range() is implemented as an inline
      function inside include/asm/cacheflush.h.  It dereferences @current
      but doesn't include linux/sched.h and thus causes build failure if
      linux/sched.h wasn't included previously.  Fix it by including the
      needed header file explicitly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      b97f897d
    • M
    • J
      31019075
  4. 18 9月, 2010 4 次提交
  5. 16 9月, 2010 1 次提交
  6. 15 9月, 2010 10 次提交
    • C
      arch/tile: fix formatting bug in register dumps · 7040dea4
      Chris Metcalf 提交于
      This cut-and-paste bug was caused by rewriting the register dump
      code to use only a single printk per line of output.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      7040dea4
    • C
      arch/tile: fix memcpy_fromio()/memcpy_toio() signatures · 0fab59e5
      Chris Metcalf 提交于
      This tripped up a driver (not yet committed to git).  Fix it now.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      0fab59e5
    • C
      arch/tile: Save and restore extra user state for tilegx · a802fc68
      Chris Metcalf 提交于
      During context switch, save and restore a couple of additional bits of
      tilegx user state that can be persistently modified by userspace.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      a802fc68
    • C
      arch/tile: Change struct sigcontext to be more useful · 74fca9da
      Chris Metcalf 提交于
      Rather than just using pt_regs, it now contains the actual saved
      state explicitly, similar to pt_regs.  By doing it this way, we
      provide a cleaner API for userspace (or equivalently, we avoid the
      need for libc to provide its own definition of sigcontext).
      
      While we're at it, move PT_FLAGS_xxx to where they are not visible
      from userspace.  And always pass siginfo and mcontext to signal
      handlers, even if they claim they don't need it, since sometimes
      they actually try to use it anyway in practice.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      74fca9da
    • C
      arch/tile: finish const-ifying sys_execve() · e6e6c46d
      Chris Metcalf 提交于
      The sys_execve() implementation was properly const-ified but not
      the declaration, the syscall wrappers, or the compat version.
      This change completes the constification process.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      e6e6c46d
    • D
      MN10300: Fix up the IRQ names for the on-chip serial ports · a4128b03
      David Howells 提交于
      Fix up the IRQ names for the MN10300 on-chip serial ports in the driver as
      request_interrupt() no longer allows names containing slashes, giving a warning
      like the following if one is encountered:
      
      	------------[ cut here ]------------
      	WARNING: at fs/proc/generic.c:323 __xlate_proc_name+0x62/0x7c()
      	name 'ttySM0/Rx'
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4128b03
    • R
      x86-64, compat: Retruncate rax after ia32 syscall entry tracing · eefdca04
      Roland McGrath 提交于
      In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a
      32-bit tracee in system call entry.  A %rax value set via ptrace at the
      entry tracing stop gets used whole as a 32-bit syscall number, while we
      only check the low 32 bits for validity.
      
      Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
      in addition to testing the full 64 bits as has already been added.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      eefdca04
    • H
      x86-64, compat: Test %rax for the syscall number, not %eax · 36d001c7
      H. Peter Anvin 提交于
      On 64 bits, we always, by necessity, jump through the system call
      table via %rax.  For 32-bit system calls, in theory the system call
      number is stored in %eax, and the code was testing %eax for a valid
      system call number.  At one point we loaded the stored value back from
      the stack to enforce zero-extension, but that was removed in checkin
      d4d67150.  An actual 32-bit process
      will not be able to introduce a non-zero-extended number, but it can
      happen via ptrace.
      
      Instead of re-introducing the zero-extension, test what we are
      actually going to use, i.e. %rax.  This only adds a handful of REX
      prefixes to the code.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      36d001c7
    • H
      compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5
      H. Peter Anvin 提交于
      compat_alloc_user_space() expects the caller to independently call
      access_ok() to verify the returned area.  A missing call could
      introduce problems on some architectures.
      
      This patch incorporates the access_ok() check into
      compat_alloc_user_space() and also adds a sanity check on the length.
      The existing compat_alloc_user_space() implementations are renamed
      arch_compat_alloc_user_space() and are used as part of the
      implementation of the new global function.
      
      This patch assumes NULL will cause __get_user()/__put_user() to either
      fail or access userspace on all architectures.  This should be
      followed by checking the return value of compat_access_user_space()
      for NULL in the callers, at which time the access_ok() in the callers
      can also be removed.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: <stable@kernel.org>
      c41d68a5
    • T
      x86: hpet: Work around hardware stupidity · 54ff7e59
      Thomas Gleixner 提交于
      This more or less reverts commits 08be9796 (x86: Force HPET
      readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
      read back to affected ATI chipsets) to the status of commit 8da854cb
      (x86, hpet: Erratum workaround for read after write of HPET
      comparator).
      
      The delta to commit 8da854cb is mostly comments and the change from
      WARN_ONCE to printk_once as we know the call path of this function
      already.
      
      This needs really in depth explanation:
      
      First of all the HPET design is a complete failure. Having a counter
      compare register which generates an interrupt on matching values
      forces the software to do at least one superfluous readback of the
      counter register.
      
      While it is nice in theory to program "absolute" time events it is
      practically useless because the timer runs at some absurd frequency
      which can never be matched to real world units. So we are forced to
      calculate a relative delta and this forces a readout of the actual
      counter value, adding the delta and programming the compare
      register. When the delta is small enough we run into the danger that
      we program a compare value which is already in the past. Due to the
      compare for equal nature of HPET we need to read back the counter
      value after writing the compare rehgister (btw. this is necessary for
      absolute timeouts as well) to make sure that we did not miss the timer
      event. We try to work around that by setting the minimum delta to a
      value which is larger than the theoretical time which elapses between
      the counter readout and the compare register write, but that's only
      true in theory. A NMI or SMI which hits between the readout and the
      write can easily push us beyond that limit. This would result in
      waiting for the next HPET timer interrupt until the 32bit wraparound
      of the counter happens which takes about 306 seconds.
      
      So we designed the next event function to look like:
      
         match = read_cnt() + delta;
         write_compare_ref(match);
         return read_cnt() < match ? 0 : -ETIME;
      
      At some point we got into trouble with certain ATI chipsets. Even the
      above "safe" procedure failed. The reason was that the write to the
      compare register was delayed probably for performance reasons. The
      theory was that they wanted to avoid the synchronization of the write
      with the HPET clock, which is understandable. So the write does not
      hit the compare register directly instead it goes to some intermediate
      register which is copied to the real compare register in sync with the
      HPET clock. That opens another window for hitting the dreaded "wait
      for a wraparound" problem.
      
      To work around that "optimization" we added a read back of the compare
      register which either enforced the update of the just written value or
      just delayed the readout of the counter enough to avoid the issue. We
      unfortunately never got any affirmative info from ATI/AMD about this.
      
      One thing is sure, that we nuked the performance "optimization" that
      way completely and I'm pretty sure that the result is worse than
      before some HW folks came up with those.
      
      Just for paranoia reasons I added a check whether the read back
      compare register value was the same as the value we wrote right
      before. That paranoia check triggered a couple of years after it was
      added on an Intel ICH9 chipset. Venki added a workaround (commit
      8da854cb) which was reading the compare register twice when the first
      check failed. We considered this to be a penalty in general and
      restricted the readback (thus the wasted CPU cycles) to the known to
      be affected ATI chipsets.
      
      This turned out to be a utterly wrong decision. 2.6.35 testers
      experienced massive problems and finally one of them bisected it down
      to commit 30a564be which spured some further investigation.
      
      Finally we got confirmation that the write to the compare register can
      be delayed by up to two HPET clock cycles which explains the problems
      nicely. All we can do about this is to go back to Venki's initial
      workaround in a slightly modified version.
      
      Just for the record I need to say, that all of this could have been
      avoided if hardware designers and of course the HPET committee would
      have thought about the consequences for a split second. It's out of my
      comprehension why designing a working timer is so hard. There are two
      ways to achieve it:
      
       1) Use a counter wrap around aware compare_reg <= counter_reg
          implementation instead of the easy compare_reg == counter_reg
      
          Downsides:
      
      	- It needs more silicon.
      
      	- It needs a readout of the counter to apply a relative
      	  timeout. This is necessary as the counter does not run in
      	  any useful (and adjustable) frequency and there is no
      	  guarantee that the counter which is used for timer events is
      	  the same which is used for reading the actual time (and
      	  therefor for calculating the delta)
      
          Upsides:
      
      	- None
      
        2) Use a simple down counter for relative timer events
      
          Downsides:
      
      	- Absolute timeouts are not possible, which is not a problem
      	  at all in the context of an OS and the expected
      	  max. latencies/jitter (also see Downsides of #1)
      
         Upsides:
      
      	- It needs less or equal silicon.
      
      	- It works ALWAYS
      
      	- It is way faster than a compare register based solution (One
      	  write versus one write plus at least one and up to four
      	  reads)
      
      I would not be so grumpy about all of this, if I would not have been
      ignored for many years when pointing out these flaws to various
      hardware folks. I really hate timers (at least those which seem to be
      designed by janitors).
      
      Though finally we got a reasonable explanation plus a solution and I
      want to thank all the folks involved in chasing it down and providing
      valuable input to this.
      Bisected-by: NNix <nix@esperi.org.uk>
      Reported-by: NArtur Skawina <art.08.09@gmail.com>
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Reported-by: NJohn Drescher <drescherjm@gmail.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      54ff7e59
  7. 14 9月, 2010 2 次提交