1. 28 3月, 2012 11 次提交
  2. 24 3月, 2012 9 次提交
  3. 23 3月, 2012 4 次提交
  4. 22 3月, 2012 13 次提交
    • K
      xen/smp: Fix bringup bug in AP code. · 106b4438
      Konrad Rzeszutek Wilk 提交于
      The CPU hotplug code has now a callback to help bring up the CPU.
      Without the call we end up getting:
      
       BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]
      Modules linked in:
      CPU ] Pid: 6, comm: migration/0 Not tainted 3.3.0upstream-01180-ged378a52 #1 Dell Inc. PowerEdge T105 /0RR825
      RIP: e030:[<ffffffff810d3b8b>]  [<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf0
      RSP: e02b:ffff8800ceaabdb0  EFLAGS: 00000293
      .. snip..
      Call Trace:
       [<ffffffff810d3b10>] ? stop_one_cpu_nowait+0x50/0x50
       [<ffffffff810d3841>] cpu_stopper_thread+0xf1/0x1c0
       [<ffffffff815a9776>] ? __schedule+0x3c6/0x760
       [<ffffffff815aa749>] ? _raw_spin_unlock_irqrestore+0x19/0x30
       [<ffffffff810d3750>] ? res_counter_charge+0x150/0x150
       [<ffffffff8108dc76>] kthread+0x96/0xa0
       [<ffffffff815b27e4>] kernel_thread_helper+0x4/0x10
       [<ffffffff815aacbc>] ? retint_restore_ar
      
      Thix fixes it.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      106b4438
    • J
      crypto: twofish-x86_64-3way - module init/exit functions should be static · ff0a70fe
      Jussi Kivilinna 提交于
      This caused conflict with camellia-x86_64 when compiled into kernel, same
      function names and not static.
      Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      ff0a70fe
    • J
      crypto: camellia-x86_64 - module init/exit functions should be static · 676a3804
      Jussi Kivilinna 提交于
      This caused conflict with twofish-x86_64-3way when compiled into kernel,
      same function names and not static.
      Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      676a3804
    • A
      numa_emulation: fix cpumask_of_node() · d71b5a73
      Andrea Arcangeli 提交于
      Without this fix the cpumask_of_node() for a fake=numa=2 is:
      
          cpumask 0 ff
          cpumask 1 ff
      
      with the fix it's correct and it's set to:
      
          cpumask 0 55
          cpumask 1 aa
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d71b5a73
    • X
      hugetlb: remove prev_vma from hugetlb_get_unmapped_area_topdown() · b69add21
      Xiao Guangrong 提交于
      After looking up the vma which covers or follows the cached search
      address, the following condition is always true:
      
      	!prev_vma || (addr >= prev_vma->vm_end)
      
      so we can stop checking the previous VMA altogether.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b69add21
    • X
      mm: search from free_area_cache for the bigger size · b716ad95
      Xiao Guangrong 提交于
      If the required size is bigger than cached_hole_size it is better to
      search from free_area_cache - it is easier to get a free region,
      specifically for the 64 bit process whose address space is large enough
      
      Do it just as hugetlb_get_unmapped_area_topdown() in arch/x86/mm/hugetlbpage.c
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b716ad95
    • X
      hugetlb: try to search again if it is really needed · cbde83e2
      Xiao Guangrong 提交于
      Search again only if some holes may be skipped in the first pass.
      
      [akpm@linux-foundation.org: clean up crazy compound definition]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbde83e2
    • M
      sparc: use block_sigmask() · ce24d8a1
      Matt Fleming 提交于
      Use the new helper function introduced in commit 5e6292c0 ("signal:
      add block_sigmask() for adding sigmask to current->blocked") which
      centralises the code for updating current->blocked after successfully
      delivering a signal and reduces the amount of duplicate code across
      architectures.  In the past some architectures got this code wrong, so
      using this helper function should stop that from happening again.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce24d8a1
    • M
      xtensa: use set_current_blocked() and block_sigmask() · d12f7c4a
      Matt Fleming 提交于
      As described in commit e6fa16ab ("signal: sigprocmask() should do
      retarget_shared_pending()") the modification of current->blocked is
      incorrect as we need to check whether the signal we're about to block is
      pending in the shared queue.
      
      Also, use the new helper function introduced in commit 5e6292c0
      ("signal: add block_sigmask() for adding sigmask to current->blocked")
      which centralises the code for updating current->blocked after
      successfully delivering a signal and reduces the amount of duplicate code
      across architectures.  In the past some architectures got this code wrong,
      so using this helper function should stop that from happening again.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d12f7c4a
    • M
      xtensa: don't mask signals if we fail to setup signal stack · 3785006a
      Matt Fleming 提交于
      setup_frame() needs to return an indication of whether it succeeded or
      failed in setting up the signal stack frame.  If setup_frame() fails then
      we must not modify current->blocked.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3785006a
    • M
      xtensa: no need to reset handler if SA_ONESHOT · ff6d21e7
      Matt Fleming 提交于
      get_signal_to_deliver() already resets the signal handler if SA_ONESHOT
      is set in ka->sa.sa_flags, there's no need to do it again in
      handle_signal().
      
      Furthermore, because we were modifying ka->sa.sa_handler (which is a
      copy of sighand->action[]) instead of sighand->action[] the original
      code actually had no effect on signal delivery.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff6d21e7
    • M
      xtensa: don't reimplement force_sigsegv() · fa47ac59
      Matt Fleming 提交于
      Instead of open coding the sequence from force_sigsegv() just call it.
      This also fixes a bug because we were modifying ka->sa.sa_handler (which
      is a copy of sighand->action[]), whereas the intention of the code was to
      modify sighand->action[] directly.
      
      As the original code was working with a copy it had no effect on signal
      delivery.
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa47ac59
    • A
      mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode · 1a5a9906
      Andrea Arcangeli 提交于
      In some cases it may happen that pmd_none_or_clear_bad() is called with
      the mmap_sem hold in read mode.  In those cases the huge page faults can
      allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
      false positive from pmd_bad() that will not like to see a pmd
      materializing as trans huge.
      
      It's not khugepaged causing the problem, khugepaged holds the mmap_sem
      in write mode (and all those sites must hold the mmap_sem in read mode
      to prevent pagetables to go away from under them, during code review it
      seems vm86 mode on 32bit kernels requires that too unless it's
      restricted to 1 thread per process or UP builds).  The race is only with
      the huge pagefaults that can convert a pmd_none() into a
      pmd_trans_huge().
      
      Effectively all these pmd_none_or_clear_bad() sites running with
      mmap_sem in read mode are somewhat speculative with the page faults, and
      the result is always undefined when they run simultaneously.  This is
      probably why it wasn't common to run into this.  For example if the
      madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
      fault, the hugepage will not be zapped, if the page fault runs first it
      will be zapped.
      
      Altering pmd_bad() not to error out if it finds hugepmds won't be enough
      to fix this, because zap_pmd_range would then proceed to call
      zap_pte_range (which would be incorrect if the pmd become a
      pmd_trans_huge()).
      
      The simplest way to fix this is to read the pmd in the local stack
      (regardless of what we read, no need of actual CPU barriers, only
      compiler barrier needed), and be sure it is not changing under the code
      that computes its value.  Even if the real pmd is changing under the
      value we hold on the stack, we don't care.  If we actually end up in
      zap_pte_range it means the pmd was not none already and it was not huge,
      and it can't become huge from under us (khugepaged locking explained
      above).
      
      All we need is to enforce that there is no way anymore that in a code
      path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
      can run into a hugepmd.  The overhead of a barrier() is just a compiler
      tweak and should not be measurable (I only added it for THP builds).  I
      don't exclude different compiler versions may have prevented the race
      too by caching the value of *pmd on the stack (that hasn't been
      verified, but it wouldn't be impossible considering
      pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
      and there's no external function called in between pmd_trans_huge and
      pmd_none_or_clear_bad).
      
      		if (pmd_trans_huge(*pmd)) {
      			if (next-addr != HPAGE_PMD_SIZE) {
      				VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
      				split_huge_page_pmd(vma->vm_mm, pmd);
      			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
      				continue;
      			/* fall through */
      		}
      		if (pmd_none_or_clear_bad(pmd))
      
      Because this race condition could be exercised without special
      privileges this was reported in CVE-2012-1179.
      
      The race was identified and fully explained by Ulrich who debugged it.
      I'm quoting his accurate explanation below, for reference.
      
      ====== start quote =======
            mapcount 0 page_mapcount 1
            kernel BUG at mm/huge_memory.c:1384!
      
          At some point prior to the panic, a "bad pmd ..." message similar to the
          following is logged on the console:
      
            mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).
      
          The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
          the page's PMD table entry.
      
              143 void pmd_clear_bad(pmd_t *pmd)
              144 {
          ->  145         pmd_ERROR(*pmd);
              146         pmd_clear(pmd);
              147 }
      
          After the PMD table entry has been cleared, there is an inconsistency
          between the actual number of PMD table entries that are mapping the page
          and the page's map count (_mapcount field in struct page). When the page
          is subsequently reclaimed, __split_huge_page() detects this inconsistency.
      
             1381         if (mapcount != page_mapcount(page))
             1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
             1383                        mapcount, page_mapcount(page));
          -> 1384         BUG_ON(mapcount != page_mapcount(page));
      
          The root cause of the problem is a race of two threads in a multithreaded
          process. Thread B incurs a page fault on a virtual address that has never
          been accessed (PMD entry is zero) while Thread A is executing an madvise()
          system call on a virtual address within the same 2 MB (huge page) range.
      
                     virtual address space
                    .---------------------.
                    |                     |
                    |                     |
                  .-|---------------------|
                  | |                     |
                  | |                     |<-- B(fault)
                  | |                     |
            2 MB  | |/////////////////////|-.
            huge <  |/////////////////////|  > A(range)
            page  | |/////////////////////|-'
                  | |                     |
                  | |                     |
                  '-|---------------------|
                    |                     |
                    |                     |
                    '---------------------'
      
          - Thread A is executing an madvise(..., MADV_DONTNEED) system call
            on the virtual address range "A(range)" shown in the picture.
      
          sys_madvise
            // Acquire the semaphore in shared mode.
            down_read(&current->mm->mmap_sem)
            ...
            madvise_vma
              switch (behavior)
              case MADV_DONTNEED:
                   madvise_dontneed
                     zap_page_range
                       unmap_vmas
                         unmap_page_range
                           zap_pud_range
                             zap_pmd_range
                               //
                               // Assume that this huge page has never been accessed.
                               // I.e. content of the PMD entry is zero (not mapped).
                               //
                               if (pmd_trans_huge(*pmd)) {
                                   // We don't get here due to the above assumption.
                               }
                               //
                               // Assume that Thread B incurred a page fault and
                   .---------> // sneaks in here as shown below.
                   |           //
                   |           if (pmd_none_or_clear_bad(pmd))
                   |               {
                   |                 if (unlikely(pmd_bad(*pmd)))
                   |                     pmd_clear_bad
                   |                     {
                   |                       pmd_ERROR
                   |                         // Log "bad pmd ..." message here.
                   |                       pmd_clear
                   |                         // Clear the page's PMD entry.
                   |                         // Thread B incremented the map count
                   |                         // in page_add_new_anon_rmap(), but
                   |                         // now the page is no longer mapped
                   |                         // by a PMD entry (-> inconsistency).
                   |                     }
                   |               }
                   |
                   v
          - Thread B is handling a page fault on virtual address "B(fault)" shown
            in the picture.
      
          ...
          do_page_fault
            __do_page_fault
              // Acquire the semaphore in shared mode.
              down_read_trylock(&mm->mmap_sem)
              ...
              handle_mm_fault
                if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
                    // We get here due to the above assumption (PMD entry is zero).
                    do_huge_pmd_anonymous_page
                      alloc_hugepage_vma
                        // Allocate a new transparent huge page here.
                      ...
                      __do_huge_pmd_anonymous_page
                        ...
                        spin_lock(&mm->page_table_lock)
                        ...
                        page_add_new_anon_rmap
                          // Here we increment the page's map count (starts at -1).
                          atomic_set(&page->_mapcount, 0)
                        set_pmd_at
                          // Here we set the page's PMD entry which will be cleared
                          // when Thread A calls pmd_clear_bad().
                        ...
                        spin_unlock(&mm->page_table_lock)
      
          The mmap_sem does not prevent the race because both threads are acquiring
          it in shared mode (down_read).  Thread B holds the page_table_lock while
          the page's map count and PMD table entry are updated.  However, Thread A
          does not synchronize on that lock.
      
      ====== end quote =======
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Acked-by: NLarry Woodman <lwoodman@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>		[2.6.38+]
      Cc: Mark Salter <msalter@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a5a9906
  5. 21 3月, 2012 3 次提交