1. 13 6月, 2023 1 次提交
  2. 30 5月, 2023 1 次提交
  3. 18 5月, 2023 7 次提交
  4. 19 4月, 2023 1 次提交
  5. 15 11月, 2021 1 次提交
  6. 21 10月, 2021 1 次提交
  7. 15 10月, 2021 1 次提交
  8. 02 9月, 2021 2 次提交
    • X
      userswap: add a kernel parameter to enable userswap · 80efa5c8
      Xiongfeng Wang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 175146
      CVE: NA
      
      ------------------------------------
      
      Disable userswap by default and add a kernel parameter to enable it.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      
       Conflicts:
      	include/linux/userfaultfd_k.h
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      80efa5c8
    • X
      userfaultfd: fix BUG_ON() in userfaultfd_release() · 84ff5e27
      Xiongfeng Wang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 175146
      CVE: NA
      
      ------------------------------------
      
      Syzkaller caught the following BUG_ON:
      
      ------------[ cut here ]------------
      kernel BUG at fs/userfaultfd.c:909!
      Internal error: Oops - BUG: 0 [#1] SMP
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      Process syz-executor.2 (pid: 1994, stack limit = 0x0000000048da525b)
      CPU: 0 PID: 1994 Comm: syz-executor.2 Not tainted 4.19.90+ #6
      Hardware name: linux,dummy-virt (DT)
      pstate: 80000005 (Nzcv daif -PAN -UAO)
      pc : userfaultfd_release+0x4f0/0x6a0 fs/userfaultfd.c:908
      lr : userfaultfd_release+0x4f0/0x6a0 fs/userfaultfd.c:908
      sp : ffff80017d247c80
      x29: ffff80017d247c90 x28: ffff80019b25f720
      x27: 2000000000100077 x26: ffff80017c28fe40
      x25: ffff80019b25f770 x24: ffff80019b25f7e0
      x23: ffff80019b25e378 x22: 1ffff0002fa48fa6
      x21: ffff80017f103200 x20: dfff200000000000
      x19: ffff80017c28fe40 x18: 0000000000000000
      x17: ffffffff00000001 x16: 0000000000000000
      x15: 0000000000000000 x14: 0000000000000000
      x13: 0000000000000000 x12: 0000000000000000
      x11: 0000000000000000 x10: 0000000000000000
      x9 : 1ffff0002fa48fa6 x8 : ffff10002fa48fa6
      x7 : ffff20000add39f0 x6 : 00000000f2000000
      x5 : 0000000000000000 x4 : ffff10002fa48f76
      x3 : ffff200008000000 x2 : ffff20000a61d000
      x1 : ffff800160aa9000 x0 : 0000000000000000
      Call trace:
       userfaultfd_release+0x4f0/0x6a0 fs/userfaultfd.c:908
       __fput+0x20c/0x688 fs/file_table.c:278
       ____fput+0x24/0x30 fs/file_table.c:309
       task_work_run+0x13c/0x2f8 kernel/task_work.c:135
       tracehook_notify_resume include/linux/tracehook.h:193 [inline]
       do_notify_resume+0x380/0x628 arch/arm64/kernel/signal.c:728
       work_pending+0x8/0x10
      Code: 97ecb0e4 d4210000 17ffffc7 97ecb0e1 (d4210000)
      ---[ end trace de790a3f637d9e60 ]---
      
      In userfaultfd_release(), we check if 'vm_userfaultfd_ctx' and
      'vm_flags&(VM_UFFD_MISSING|VM_UFFD_WP)' are not zero at the same time.
      If not, it is bug. But we lack checking for VM_USWAP flag. So add it to
      avoid the false BUG_ON(). This patch also fix several other issues.
      
      Fixes: c3e6287f ("userswap: support userswap via userfaultfd")
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      
       Conflicts:
      	fs/userfaultfd.c
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      84ff5e27
  9. 19 7月, 2021 1 次提交
  10. 17 10月, 2020 1 次提交
  11. 04 8月, 2020 1 次提交
    • L
      userfaultfd: simplify fault handling · f9bf3522
      Linus Torvalds 提交于
      Instead of waiting in a loop for the userfaultfd condition to become
      true, just wait once and return VM_FAULT_RETRY.
      
      We've already dropped the mmap lock, we know we can't really
      successfully handle the fault at this point and the caller will have to
      retry anyway.  So there's no point in making the wait any more
      complicated than it needs to be - just schedule away.
      
      And once you don't have that complexity with explicit looping, you can
      also just lose all the 'userfaultfd_signal_pending()' complexity,
      because once we've set the correct process sleeping state, and don't
      loop, the act of scheduling itself will be checking if there are any
      pending signals before going to sleep.
      
      We can also drop the VM_FAULT_MAJOR games, since we'll be treating all
      retried faults as major soon anyway (series to regularize and share more
      of fault handling across architectures in a separate series by Peter Xu,
      and in the meantime we won't worry about the possible minor - I'll be
      here all week, try the veal - accounting difference).
      
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9bf3522
  12. 29 7月, 2020 1 次提交
  13. 10 6月, 2020 4 次提交
  14. 08 4月, 2020 4 次提交
  15. 03 4月, 2020 3 次提交
    • P
      mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path · 3e69ad08
      Peter Xu 提交于
      Userfaultfd fault path was by default killable even if the caller does not
      have FAULT_FLAG_KILLABLE.  That makes sense before in that when with gup
      we don't have FAULT_FLAG_KILLABLE properly set before.  Now after previous
      patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
      also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.
      
      Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
      right now, this patch should have no functional change.  It also cleaned
      the code a little bit by introducing some helpers.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e69ad08
    • P
      mm: introduce FAULT_FLAG_INTERRUPTIBLE · c270a7ee
      Peter Xu 提交于
      handle_userfaultfd() is currently the only one place in the kernel page
      fault procedures that can respond to non-fatal userspace signals.  It was
      trying to detect such an allowance by checking against USER & KILLABLE
      flags, which was "un-official".
      
      In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
      that the fault handler allows the fault procedure to respond even to
      non-fatal signals.  Meanwhile, add this new flag to the default fault
      flags so that all the page fault handlers can benefit from the new flag.
      With that, replacing the userfault check to this one.
      
      Since the line is getting even longer, clean up the fault flags a bit too
      to ease TTY users.
      
      Although we've got a new flag and applied it, we shouldn't have any
      functional change with this patch so far.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c270a7ee
    • P
      userfaultfd: don't retake mmap_sem to emulate NOPAGE · ef429ee7
      Peter Xu 提交于
      This patch removes the risk path in handle_userfault() then we will be
      sure that the callers of handle_mm_fault() will know that the VMAs might
      have changed.  Meanwhile with previous patch we don't lose responsiveness
      as well since the core mm code now can handle the nonfatal userspace
      signals even if we return VM_FAULT_RETRY.
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef429ee7
  16. 02 12月, 2019 2 次提交
  17. 23 10月, 2019 1 次提交
  18. 26 9月, 2019 1 次提交
  19. 25 8月, 2019 1 次提交
  20. 05 7月, 2019 1 次提交
    • E
      fs/userfaultfd.c: disable irqs for fault_pending and event locks · cbcfa130
      Eric Biggers 提交于
      When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.
      
      This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
      userfaultfd_ctx_read(), which in turn can be waiting for
      userfaultfd_ctx::fault_pending_wqh.lock or
      userfaultfd_ctx::event_wqh.lock.
      
      But elsewhere the fault_pending_wqh and event_wqh locks are taken with
      IRQs enabled.  Since the IRQ handler may take kioctx::ctx_lock, lockdep
      reports that a deadlock is possible.
      
      Fix it by always disabling IRQs when taking the fault_pending_wqh and
      event_wqh locks.
      
      Commit ae62c16e ("userfaultfd: disable irqs when taking the
      waitqueue lock") didn't fix this because it only accounted for the
      fd_wqh lock, not the other locks nested inside it.
      
      Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
      Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
      Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbcfa130
  21. 19 6月, 2019 1 次提交
  22. 15 5月, 2019 1 次提交
    • P
      userfaultfd/sysctl: add vm.unprivileged_userfaultfd · cefdca0a
      Peter Xu 提交于
      Userfaultfd can be misued to make it easier to exploit existing
      use-after-free (and similar) bugs that might otherwise only make a
      short window or race condition available.  By using userfaultfd to
      stall a kernel thread, a malicious program can keep some state that it
      wrote, stable for an extended period, which it can then access using an
      existing exploit.  While it doesn't cause the exploit itself, and while
      it's not the only thing that can stall a kernel thread when accessing a
      memory location, it's one of the few that never needs privilege.
      
      We can add a flag, allowing userfaultfd to be restricted, so that in
      general it won't be useable by arbitrary user programs, but in
      environments that require userfaultfd it can be turned back on.
      
      Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
      whether userfaultfd is allowed by unprivileged users.  When this is
      set to zero, only privileged users (root user, or users with the
      CAP_SYS_PTRACE capability) will be able to use the userfaultfd
      syscalls.
      
      Andrea said:
      
      : The only difference between the bpf sysctl and the userfaultfd sysctl
      : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
      : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
      : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
      : already if it's doing other kind of tracking on processes runtime, in
      : addition of userfaultfd.  In other words both syscalls works only for
      : root, when the two sysctl are opt-in set to 1.
      
      [dgilbert@redhat.com: changelog additions]
      [akpm@linux-foundation.org: documentation tweak, per Mike]
      Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Suggested-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cefdca0a
  23. 20 4月, 2019 1 次提交
    • A
      coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping · 04f5866e
      Andrea Arcangeli 提交于
      The core dumping code has always run without holding the mmap_sem for
      writing, despite that is the only way to ensure that the entire vma
      layout will not change from under it.  Only using some signal
      serialization on the processes belonging to the mm is not nearly enough.
      This was pointed out earlier.  For example in Hugh's post from Jul 2017:
      
        https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
      
        "Not strictly relevant here, but a related note: I was very surprised
         to discover, only quite recently, how handle_mm_fault() may be called
         without down_read(mmap_sem) - when core dumping. That seems a
         misguided optimization to me, which would also be nice to correct"
      
      In particular because the growsdown and growsup can move the
      vm_start/vm_end the various loops the core dump does around the vma will
      not be consistent if page faults can happen concurrently.
      
      Pretty much all users calling mmget_not_zero()/get_task_mm() and then
      taking the mmap_sem had the potential to introduce unexpected side
      effects in the core dumping code.
      
      Adding mmap_sem for writing around the ->core_dump invocation is a
      viable long term fix, but it requires removing all copy user and page
      faults and to replace them with get_dump_page() for all binary formats
      which is not suitable as a short term fix.
      
      For the time being this solution manually covers the places that can
      confuse the core dump either by altering the vma layout or the vma flags
      while it runs.  Once ->core_dump runs under mmap_sem for writing the
      function mmget_still_valid() can be dropped.
      
      Allowing mmap_sem protected sections to run in parallel with the
      coredump provides some minor parallelism advantage to the swapoff code
      (which seems to be safe enough by never mangling any vma field and can
      keep doing swapins in parallel to the core dumping) and to some other
      corner case.
      
      In order to facilitate the backporting I added "Fixes: 86039bd3"
      however the side effect of this same race condition in /proc/pid/mem
      should be reproducible since before 2.6.12-rc2 so I couldn't add any
      other "Fixes:" because there's no hash beyond the git genesis commit.
      
      Because find_extend_vma() is the only location outside of the process
      context that could modify the "mm" structures under mmap_sem for
      reading, by adding the mmget_still_valid() check to it, all other cases
      that take the mmap_sem for reading don't need the new check after
      mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
      context also doesn't need the new check, because all tasks under core
      dumping are frozen.
      
      Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
      Fixes: 86039bd3 ("userfaultfd: add new syscall to provide memory externalization")
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NJann Horn <jannh@google.com>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJann Horn <jannh@google.com>
      Acked-by: NJason Gunthorpe <jgg@mellanox.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04f5866e
  24. 29 12月, 2018 1 次提交