1. 10 3月, 2017 7 次提交
  2. 02 3月, 2017 2 次提交
  3. 28 2月, 2017 2 次提交
  4. 25 2月, 2017 4 次提交
  5. 23 2月, 2017 16 次提交
  6. 25 1月, 2017 1 次提交
    • A
      userfaultfd: fix SIGBUS resulting from false rwsem wakeups · 15a77c6f
      Andrea Arcangeli 提交于
      With >=32 CPUs the userfaultfd selftest triggered a graceful but
      unexpected SIGBUS because VM_FAULT_RETRY was returned by
      handle_userfault() despite the UFFDIO_COPY wasn't completed.
      
      This seems caused by rwsem waking the thread blocked in
      handle_userfault() and we can't run up_read() before the wait_event
      sequence is complete.
      
      Keeping the wait_even sequence identical to the first one, would require
      running userfaultfd_must_wait() again to know if the loop should be
      repeated, and it would also require retaking the rwsem and revalidating
      the whole vma status.
      
      It seems simpler to wait the targeted wakeup so that if false wakeups
      materialize we still wait for our specific wakeup event, unless of
      course there are signals or the uffd was released.
      
      Debug code collecting the stack trace of the wakeup showed this:
      
        $ ./userfaultfd 100 99999
        nr_pages: 25600, nr_pages_per_cpu: 800
        bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2
        bounces: 99997, mode: rnd ver poll, Bus error (core dumped)
      
          save_stack_trace+0x2b/0x50
          try_to_wake_up+0x2a6/0x580
          wake_up_q+0x32/0x70
          rwsem_wake+0xe0/0x120
          call_rwsem_wake+0x1b/0x30
          up_write+0x3b/0x40
          vm_mmap_pgoff+0x9c/0xc0
          SyS_mmap_pgoff+0x1a9/0x240
          SyS_mmap+0x22/0x30
          entry_SYSCALL_64_fastpath+0x1f/0xbd
          0xffffffffffffffff
          FAULT_FLAG_ALLOW_RETRY missing 70
        CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G        W       4.8.0+ #30
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
        Call Trace:
          dump_stack+0xb8/0x112
          handle_userfault+0x572/0x650
          handle_mm_fault+0x12cb/0x1520
          __do_page_fault+0x175/0x500
          trace_do_page_fault+0x61/0x270
          do_async_page_fault+0x19/0x90
          async_page_fault+0x25/0x30
      
      This always happens when the main userfault selftest thread is running
      clone() while glibc runs either mprotect or mmap (both taking mmap_sem
      down_write()) to allocate the thread stack of the background threads,
      while locking/userfault threads already run at full throttle and are
      susceptible to false wakeups that may cause handle_userfault() to return
      before than expected (which results in graceful SIGBUS at the next
      attempt).
      
      This was reproduced only with >=32 CPUs because the loop to start the
      thread where clone() is too quick with fewer CPUs, while with 32 CPUs
      there's already significant activity on ~32 locking and userfault
      threads when the last background threads are started with clone().
      
      This >=32 CPUs SMP race condition is likely reproducible only with the
      selftest because of the much heavier userfault load it generates if
      compared to real apps.
      
      We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a
      patch floating around that provides it also hidden this problem but in
      reality only is successfully at hiding the problem.
      
      False wakeups could still happen again the second time
      handle_userfault() is invoked, even if it's a so rare race condition
      that getting false wakeups twice in a row is impossible to reproduce.
      This full fix is needed for correctness, the only alternative would be
      to allow VM_FAULT_RETRY to be returned infinitely.  With this fix the WP
      support can stick to a strict "one more" VM_FAULT_RETRY logic (no need
      of returning it infinite times to avoid the SIGBUS).
      
      Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NShubham Kumar Sharma <shubham.kumar.sharma@oracle.com>
      Tested-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15a77c6f
  7. 15 12月, 2016 1 次提交
  8. 27 7月, 2016 1 次提交
  9. 21 5月, 2016 1 次提交
  10. 03 3月, 2016 1 次提交
    • L
      userfaultfd: don't block on the last VM updates at exit time · 39680f50
      Linus Torvalds 提交于
      The exit path will do some final updates to the VM of an exiting process
      to inform others of the fact that the process is going away.
      
      That happens, for example, for robust futex state cleanup, but also if
      the parent has asked for a TID update when the process exits (we clear
      the child tid field in user space).
      
      However, at the time we do those final VM accesses, we've already
      stopped accepting signals, so the usual "stop waiting for userfaults on
      signal" code in fs/userfaultfd.c no longer works, and the process can
      become an unkillable zombie waiting for something that will never
      happen.
      
      To solve this, just make handle_userfault() abort any user fault
      handling if we're already in the exit path past the signal handling
      state being dead (marked by PF_EXITING).
      
      This VM special case is pretty ugly, and it is possible that we should
      look at finalizing signals later (or move the VM final accesses
      earlier).  But in the meantime this is a fairly minimally intrusive fix.
      Reported-and-tested-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39680f50
  11. 23 9月, 2015 1 次提交
  12. 18 9月, 2015 1 次提交
  13. 05 9月, 2015 2 次提交
    • A
      userfaultfd: avoid missing wakeups during refile in userfaultfd_read · 2c5b7e1b
      Andrea Arcangeli 提交于
      During the refile in userfaultfd_read both waitqueues could look empty to
      the lockless wake_userfault().  Use a seqcount to prevent this false
      negative that could leave an userfault blocked.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c5b7e1b
    • A
      userfaultfd: allow signals to interrupt a userfault · dfa37dc3
      Andrea Arcangeli 提交于
      This is only simple to achieve if the userfault is going to return to
      userland (not to the kernel) because we can avoid returning VM_FAULT_RETRY
      despite we temporarily released the mmap_sem.  The fault would just be
      retried by userland then.  This is safe at least on x86 and powerpc (the
      two archs with the syscall implemented so far).
      
      Hint to verify for which archs this is safe: after handle_mm_fault
      returns, no access to data structures protected by the mmap_sem must be
      done by the fault code in arch/*/mm/fault.c until up_read(&mm->mmap_sem)
      is called.
      
      This has two main benefits: signals can run with lower latency in
      production (signals aren't blocked by userfaults and userfaults are
      immediately repeated after signal processing) and gdb can then trivially
      debug the threads blocked in this kind of userfaults coming directly from
      userland.
      
      On a side note: while gdb has a need to get signal processed, coredumps
      always worked perfectly with userfaults, no matter if the userfault is
      triggered by GUP a kernel copy_user or directly from userland.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfa37dc3