1. 24 12月, 2011 2 次提交
  2. 21 12月, 2011 1 次提交
  3. 18 12月, 2011 1 次提交
  4. 14 12月, 2011 1 次提交
    • F
      x86: Add per-cpu stat counter for APIC ICR read tries · 346b46be
      Fernando Luis Vázquez Cao 提交于
      In the IPI delivery slow path (NMI delivery) we retry the ICR
      read to check for delivery completion a limited number of times.
      
      [ The reason for the limited retries is that some of the places
        where it is used (cpu boot, kdump, etc) IPI delivery might not
        succeed (due to a firmware bug or system crash, for example)
        and in such a case it is better to give up and resume
        execution of other code. ]
      
      This patch adds a new entry to /proc/interrupts, RTR, which
      tells user space the number of times we retried the ICR read in
      the IPI delivery slow path.
      
      This should give some insight into how well the APIC
      message delivery hardware is working - if the counts are way
      too large then we are hitting a (very-) slow path way too
      often.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Cc: Jörn Engel <joern@logfs.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/n/tip-vzsp20lo2xdzh5f70g0eis2s@git.kernel.org
      [ extended the changelog ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      346b46be
  5. 06 12月, 2011 4 次提交
  6. 05 12月, 2011 1 次提交
    • L
      x86: Fix boot failures on older AMD CPU's · 8e8da023
      Linus Torvalds 提交于
      People with old AMD chips are getting hung boots, because commit
      bcb80e53 ("x86, microcode, AMD: Add microcode revision to
      /proc/cpuinfo") moved the microcode detection too early into
      "early_init_amd()".
      
      At that point we are *so* early in the booth that the exception tables
      haven't even been set up yet, so the whole
      
      	rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
      
      doesn't actually work: if the rdmsr does a GP fault (due to non-existant
      MSR register on older CPU's), we can't fix it up yet, and the boot fails.
      
      Fix it by simply moving the code to a slightly later point in the boot
      (init_amd() instead of early_init_amd()), since the kernel itself
      doesn't even really care about the microcode patchlevel at this point
      (or really ever: it's made available to user space in /proc/cpuinfo, and
      updated if you do a microcode load).
      Reported-tested-and-bisected-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Tested-by: NBob Tracy <rct@gherkin.frus.com>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e8da023
  7. 04 12月, 2011 1 次提交
    • K
      xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf
      Konrad Rzeszutek Wilk 提交于
      The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
      pm_idle and default_idle") was to have one call - disable_cpuidle()
      which would make pm_idle not be molested by other code.  It disallows
      cpuidle_idle_call to be set to pm_idle (which is excellent).
      
      But in the select_idle_routine() and idle_setup(), the pm_idle can still
      be set to either: amd_e400_idle, mwait_idle or default_idle.  This
      depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.
      
      In case of mwait_idle we can hit some instances where the hypervisor
      (Amazon EC2 specifically) sets the MWAIT and we get:
      
        Brought up 2 CPUs
        invalid opcode: 0000 [#1] SMP
      
        Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
        RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
        ...
        Call Trace:
         [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
         [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
        RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
         RSP <ffff8801d28ddf10>
      
      In the case of amd_e400_idle we don't get so spectacular crashes, but we
      do end up making an MSR which is trapped in the hypervisor, and then
      follow it up with a yield hypercall.  Meaning we end up going to
      hypervisor twice instead of just once.
      
      The previous behavior before v3.0 was that pm_idle was set to
      default_idle regardless of select_idle_routine/idle_setup.
      
      We want to do that, but only for one specific case: Xen.  This patch
      does that.
      
      Fixes RH BZ #739499 and Ubuntu #881076
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5fd47bf
  8. 22 11月, 2011 1 次提交
  9. 20 11月, 2011 1 次提交
  10. 17 11月, 2011 5 次提交
  11. 14 11月, 2011 1 次提交
  12. 12 11月, 2011 3 次提交
  13. 11 11月, 2011 1 次提交
  14. 10 11月, 2011 5 次提交
  15. 08 11月, 2011 1 次提交
  16. 07 11月, 2011 1 次提交
  17. 03 11月, 2011 2 次提交
    • A
      thp: share get_huge_page_tail() · b35a35b5
      Andrea Arcangeli 提交于
      This avoids duplicating the function in every arch gup_fast.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35a35b5
    • A
      mm: thp: tail page refcounting fix · 70b50f94
      Andrea Arcangeli 提交于
      Michel while working on the working set estimation code, noticed that
      calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
      wasn't safe, if the pfn ended up being a tail page of a transparent
      hugepage under splitting by __split_huge_page_refcount().
      
      He then found the problem could also theoretically materialize with
      page_cache_get_speculative() during the speculative radix tree lookups
      that uses get_page_unless_zero() in SMP if the radix tree page is freed
      and reallocated and get_user_pages is called on it before
      page_cache_get_speculative has a chance to call get_page_unless_zero().
      
      So the best way to fix the problem is to keep page_tail->_count zero at
      all times.  This will guarantee that get_page_unless_zero() can never
      succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
      is unused for all tail pages of a compound page, so we can simply
      account the tail page references there and transfer them to
      tail_page->_count in __split_huge_page_refcount() (in addition to the
      head_page->_mapcount).
      
      While debugging this s/_count/_mapcount/ change I also noticed get_page is
      called by direct-io.c on pages returned by get_user_pages.  That wasn't
      entirely safe because the two atomic_inc in get_page weren't atomic.  As
      opposed to other get_user_page users like secondary-MMU page fault to
      establish the shadow pagetables would never call any superflous get_page
      after get_user_page returns.  It's safer to make get_page universally safe
      for tail pages and to use get_page_foll() within follow_page (inside
      get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
      pages without taking any locks because it is run within PT lock protected
      critical sections (PT lock for pte and page_table_lock for
      pmd_trans_huge).
      
      The standard get_page() as invoked by direct-io instead will now take
      the compound_lock but still only for tail pages.  The direct-io paths
      are usually I/O bound and the compound_lock is per THP so very
      finegrined, so there's no risk of scalability issues with it.  A simple
      direct-io benchmarks with all lockdep prove locking and spinlock
      debugging infrastructure enabled shows identical performance and no
      overhead.  So it's worth it.  Ideally direct-io should stop calling
      get_page() on pages returned by get_user_pages().  The spinlock in
      get_page() is already optimized away for no-THP builds but doing
      get_page() on tail pages returned by GUP is generally a rare operation
      and usually only run in I/O paths.
      
      This new refcounting on page_tail->_mapcount in addition to avoiding new
      RCU critical sections will also allow the working set estimation code to
      work without any further complexity associated to the tail page
      refcounting with THP.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b50f94
  18. 02 11月, 2011 8 次提交