1. 05 12月, 2011 1 次提交
  2. 14 11月, 2011 3 次提交
  3. 12 11月, 2011 3 次提交
  4. 07 11月, 2011 1 次提交
  5. 04 11月, 2011 1 次提交
    • R
      oprofile, x86: Fix crash when unloading module (nmi timer mode) · 97f7f818
      Robert Richter 提交于
      If oprofile uses the nmi timer interrupt there is a crash while
      unloading the module. The bug can be triggered with oprofile build as
      module and kernel parameter nolapic set. This patch fixes this.
      
      oprofile: using NMI timer interrupt.
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
      PGD 42dbca067 PUD 41da6a067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP
      CPU 5
      Modules linked in: oprofile(-) [last unloaded: oprofile]
      
      Pid: 2518, comm: modprobe Not tainted 3.1.0-rc7-00019-gb2fb49d #19 Advanced Micro Device Anaheim/Anaheim
      RIP: 0010:[<ffffffff8123c226>]  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
      RSP: 0018:ffff88041ef71e98  EFLAGS: 00010296
      RAX: 0000000000000000 RBX: ffffffffa0017100 RCX: dead000000200200
      RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
      RBP: ffff88041ef71ea8 R08: 0000000000000001 R09: 0000000000000082
      R10: 0000000000000000 R11: ffff88041ef71de8 R12: 0000000000000080
      R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
      FS:  00007fc902f20700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000008 CR3: 000000041cdb6000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process modprobe (pid: 2518, threadinfo ffff88041ef70000, task ffff88041d348040)
      Stack:
       ffff88041ef71eb8 ffffffffa0017790 ffff88041ef71eb8 ffffffffa0013532
       ffff88041ef71ec8 ffffffffa00132d6 ffff88041ef71ed8 ffffffffa00159b2
       ffff88041ef71f78 ffffffff81073115 656c69666f72706f 0000000000610200
      Call Trace:
       [<ffffffffa0013532>] op_nmi_exit+0x15/0x17 [oprofile]
       [<ffffffffa00132d6>] oprofile_arch_exit+0xe/0x10 [oprofile]
       [<ffffffffa00159b2>] oprofile_exit+0x1e/0x20 [oprofile]
       [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f
       [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b
      Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
       89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
      RIP  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
       RSP <ffff88041ef71e98>
      CR2: 0000000000000008
      ---[ end trace 43a541a52956b7b0 ]---
      
      CC: stable@kernel.org # 2.6.37+
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      97f7f818
  6. 03 11月, 2011 2 次提交
    • A
      thp: share get_huge_page_tail() · b35a35b5
      Andrea Arcangeli 提交于
      This avoids duplicating the function in every arch gup_fast.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35a35b5
    • A
      mm: thp: tail page refcounting fix · 70b50f94
      Andrea Arcangeli 提交于
      Michel while working on the working set estimation code, noticed that
      calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
      wasn't safe, if the pfn ended up being a tail page of a transparent
      hugepage under splitting by __split_huge_page_refcount().
      
      He then found the problem could also theoretically materialize with
      page_cache_get_speculative() during the speculative radix tree lookups
      that uses get_page_unless_zero() in SMP if the radix tree page is freed
      and reallocated and get_user_pages is called on it before
      page_cache_get_speculative has a chance to call get_page_unless_zero().
      
      So the best way to fix the problem is to keep page_tail->_count zero at
      all times.  This will guarantee that get_page_unless_zero() can never
      succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
      is unused for all tail pages of a compound page, so we can simply
      account the tail page references there and transfer them to
      tail_page->_count in __split_huge_page_refcount() (in addition to the
      head_page->_mapcount).
      
      While debugging this s/_count/_mapcount/ change I also noticed get_page is
      called by direct-io.c on pages returned by get_user_pages.  That wasn't
      entirely safe because the two atomic_inc in get_page weren't atomic.  As
      opposed to other get_user_page users like secondary-MMU page fault to
      establish the shadow pagetables would never call any superflous get_page
      after get_user_page returns.  It's safer to make get_page universally safe
      for tail pages and to use get_page_foll() within follow_page (inside
      get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
      pages without taking any locks because it is run within PT lock protected
      critical sections (PT lock for pte and page_table_lock for
      pmd_trans_huge).
      
      The standard get_page() as invoked by direct-io instead will now take
      the compound_lock but still only for tail pages.  The direct-io paths
      are usually I/O bound and the compound_lock is per THP so very
      finegrined, so there's no risk of scalability issues with it.  A simple
      direct-io benchmarks with all lockdep prove locking and spinlock
      debugging infrastructure enabled shows identical performance and no
      overhead.  So it's worth it.  Ideally direct-io should stop calling
      get_page() on pages returned by get_user_pages().  The spinlock in
      get_page() is already optimized away for no-THP builds but doing
      get_page() on tail pages returned by GUP is generally a rare operation
      and usually only run in I/O paths.
      
      This new refcounting on page_tail->_mapcount in addition to avoiding new
      RCU critical sections will also allow the working set estimation code to
      work without any further complexity associated to the tail page
      refcounting with THP.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b50f94
  7. 02 11月, 2011 24 次提交
  8. 01 11月, 2011 5 次提交
    • B
      i7core_edac: Drop the edac_mce facility · 4140c542
      Borislav Petkov 提交于
      Remove edac_mce pieces and use the normal MCE decoder notifier chain by
      retaining the same functionality with considerably less code.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      4140c542
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
    • P
      lguest: add export.h to lguest files for THIS_MODULE/EXPORT_SYMBOL · 39a0e33d
      Paul Gortmaker 提交于
      We need this in advance of the module.h cleanup, or we'll
      get compile errors like this:
      
        CC      drivers/lguest/lguest_device.o
      drivers/lguest/lguest_device.c: In function ‘lguest_devices_init’:
      drivers/lguest/lguest_device.c:490: error: ‘THIS_MODULE’ undeclared (first use in this function)
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      39a0e33d
    • P
      x86: efi_32.c is implicitly getting asm/desc.h via module.h · 783ac47c
      Paul Gortmaker 提交于
      We want to clean up the chain of includes stumbling through
      module.h, and when we do that, we'll see:
      
        CC      arch/x86/platform/efi/efi_32.o
        efi/efi_32.c: In function ‘efi_call_phys_prelog’:
        efi/efi_32.c:80: error: implicit declaration of function ‘get_cpu_gdt_table’
        efi/efi_32.c:82: error: implicit declaration of function ‘load_gdt’
        make[4]: *** [arch/x86/platform/efi/efi_32.o] Error 1
      
      Include asm/desc.h so that there are no implicit include assumptions.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      783ac47c
    • P
      x86: fix up files really needing to include module.h · 7c52d551
      Paul Gortmaker 提交于
      These files aren't just exporting symbols -- they are also defining
      a MODULE_LICENSE etc. so give them the full module.h file.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      7c52d551