1. 06 12月, 2011 3 次提交
    • A
      x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions · f62ef5f3
      Andreas Herrmann 提交于
      I've received complaints that the numa_node attribute for family
      15h model 00-0fh (e.g. Interlagos) northbridge functions shows
      -1 instead of the proper node ID.
      
      Correct this with attached quirks (similar to quirks for other
      AMD CPU families used in multi-socket systems).
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Frank Arnold <frank.arnold@amd.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/20111202072143.GA31916@alberich.amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f62ef5f3
    • M
      x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms · 35d47699
      Mathias Nyman 提交于
      Intel MID x86 platforms have a memory mapped virtual RTC
      instead.  No MID platform have the default ports (and
      accessing them may do weird stuff).
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Cc: feng.tang@intel.com
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      35d47699
    • K
      x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode · 2cd1c8d4
      Konrad Rzeszutek Wilk 提交于
      Fix an outstanding issue that has been reported since 2.6.37.
      Under a heavy loaded machine processing "fork()" calls could
      crash with:
      
      BUG: unable to handle kernel paging request at f573fc8c
      IP: [<c01abc54>] swap_count_continued+0x104/0x180
      *pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000 Oops: 0000 [#1] SMP
      Modules linked in:
      Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
      EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3
      EIP is at swap_count_continued+0x104/0x180
      .. snip..
      Call Trace:
       [<c01ac222>] ? __swap_duplicate+0xc2/0x160
       [<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0
       [<c01ac2e4>] ? swap_duplicate+0x14/0x40
       [<c01a0a6b>] ? copy_pte_range+0x45b/0x500
       [<c01a0ca5>] ? copy_page_range+0x195/0x200
       [<c01328c6>] ? dup_mmap+0x1c6/0x2c0
       [<c0132cf8>] ? dup_mm+0xa8/0x130
       [<c013376a>] ? copy_process+0x98a/0xb30
       [<c013395f>] ? do_fork+0x4f/0x280
       [<c01573b3>] ? getnstimeofday+0x43/0x100
       [<c010f770>] ? sys_clone+0x30/0x40
       [<c06c048d>] ? ptregs_clone+0x15/0x48
       [<c06bfb71>] ? syscall_call+0x7/0xb
      
      The problem is that in copy_page_range() we turn lazy mode on,
      and then in swap_entry_free() we call swap_count_continued()
      which ends up in:
      
               map = kmap_atomic(page, KM_USER0) + offset;
      
      and then later we touch *map.
      
      Since we are running in batched mode (lazy) we don't actually
      set up the PTE mappings and the kmap_atomic is not done
      synchronously and ends up trying to dereference a page that has
      not been set.
      
      Looking at kmap_atomic_prot_pfn(), it uses
      'arch_flush_lazy_mmu_mode' and doing the same in
      kmap_atomic_prot() and __kunmap_atomic() makes the problem go
      away.
      
      Interestingly, commit b8bcfe99 ("x86/paravirt: remove lazy
      mode in interrupts") removed part of this to fix an interrupt
      issue - but it went to far and did not consider this scenario.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2cd1c8d4
  2. 05 12月, 2011 13 次提交
  3. 04 12月, 2011 1 次提交
    • K
      xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf
      Konrad Rzeszutek Wilk 提交于
      The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
      pm_idle and default_idle") was to have one call - disable_cpuidle()
      which would make pm_idle not be molested by other code.  It disallows
      cpuidle_idle_call to be set to pm_idle (which is excellent).
      
      But in the select_idle_routine() and idle_setup(), the pm_idle can still
      be set to either: amd_e400_idle, mwait_idle or default_idle.  This
      depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.
      
      In case of mwait_idle we can hit some instances where the hypervisor
      (Amazon EC2 specifically) sets the MWAIT and we get:
      
        Brought up 2 CPUs
        invalid opcode: 0000 [#1] SMP
      
        Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
        RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
        ...
        Call Trace:
         [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
         [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
        RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
         RSP <ffff8801d28ddf10>
      
      In the case of amd_e400_idle we don't get so spectacular crashes, but we
      do end up making an MSR which is trapped in the hypervisor, and then
      follow it up with a yield hypercall.  Meaning we end up going to
      hypervisor twice instead of just once.
      
      The previous behavior before v3.0 was that pm_idle was set to
      default_idle regardless of select_idle_routine/idle_setup.
      
      We want to do that, but only for one specific case: Xen.  This patch
      does that.
      
      Fixes RH BZ #739499 and Ubuntu #881076
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5fd47bf
  4. 22 11月, 2011 1 次提交
  5. 20 11月, 2011 1 次提交
  6. 17 11月, 2011 5 次提交
  7. 14 11月, 2011 1 次提交
  8. 12 11月, 2011 3 次提交
  9. 11 11月, 2011 1 次提交
  10. 10 11月, 2011 5 次提交
  11. 08 11月, 2011 1 次提交
  12. 07 11月, 2011 1 次提交
  13. 03 11月, 2011 2 次提交
    • A
      thp: share get_huge_page_tail() · b35a35b5
      Andrea Arcangeli 提交于
      This avoids duplicating the function in every arch gup_fast.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35a35b5
    • A
      mm: thp: tail page refcounting fix · 70b50f94
      Andrea Arcangeli 提交于
      Michel while working on the working set estimation code, noticed that
      calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
      wasn't safe, if the pfn ended up being a tail page of a transparent
      hugepage under splitting by __split_huge_page_refcount().
      
      He then found the problem could also theoretically materialize with
      page_cache_get_speculative() during the speculative radix tree lookups
      that uses get_page_unless_zero() in SMP if the radix tree page is freed
      and reallocated and get_user_pages is called on it before
      page_cache_get_speculative has a chance to call get_page_unless_zero().
      
      So the best way to fix the problem is to keep page_tail->_count zero at
      all times.  This will guarantee that get_page_unless_zero() can never
      succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
      is unused for all tail pages of a compound page, so we can simply
      account the tail page references there and transfer them to
      tail_page->_count in __split_huge_page_refcount() (in addition to the
      head_page->_mapcount).
      
      While debugging this s/_count/_mapcount/ change I also noticed get_page is
      called by direct-io.c on pages returned by get_user_pages.  That wasn't
      entirely safe because the two atomic_inc in get_page weren't atomic.  As
      opposed to other get_user_page users like secondary-MMU page fault to
      establish the shadow pagetables would never call any superflous get_page
      after get_user_page returns.  It's safer to make get_page universally safe
      for tail pages and to use get_page_foll() within follow_page (inside
      get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
      pages without taking any locks because it is run within PT lock protected
      critical sections (PT lock for pte and page_table_lock for
      pmd_trans_huge).
      
      The standard get_page() as invoked by direct-io instead will now take
      the compound_lock but still only for tail pages.  The direct-io paths
      are usually I/O bound and the compound_lock is per THP so very
      finegrined, so there's no risk of scalability issues with it.  A simple
      direct-io benchmarks with all lockdep prove locking and spinlock
      debugging infrastructure enabled shows identical performance and no
      overhead.  So it's worth it.  Ideally direct-io should stop calling
      get_page() on pages returned by get_user_pages().  The spinlock in
      get_page() is already optimized away for no-THP builds but doing
      get_page() on tail pages returned by GUP is generally a rare operation
      and usually only run in I/O paths.
      
      This new refcounting on page_tail->_mapcount in addition to avoiding new
      RCU critical sections will also allow the working set estimation code to
      work without any further complexity associated to the tail page
      refcounting with THP.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b50f94
  14. 02 11月, 2011 2 次提交