1. 26 10月, 2021 1 次提交
  2. 21 10月, 2021 1 次提交
  3. 28 8月, 2021 1 次提交
    • T
      eventfd: Make signal recursion protection a task bit · b542e383
      Thomas Gleixner 提交于
      The recursion protection for eventfd_signal() is based on a per CPU
      variable and relies on the !RT semantics of spin_lock_irqsave() for
      protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
      disables preemption nor interrupts which allows the spin lock held section
      to be preempted. If the preempting task invokes eventfd_signal() as well,
      then the recursion warning triggers.
      
      Paolo suggested to protect the per CPU variable with a local lock, but
      that's heavyweight and actually not necessary. The goal of this protection
      is to prevent the task stack from overflowing, which can be achieved with a
      per task recursion protection as well.
      
      Replace the per CPU variable with a per task bit similar to other recursion
      protection bits like task_struct::in_page_owner. This works on both !RT and
      RT kernels and removes as a side effect the extra per CPU storage.
      
      No functional change for !RT kernels.
      Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx
      b542e383
  4. 01 5月, 2021 1 次提交
  5. 16 12月, 2020 1 次提交
    • D
      mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio · cd544fd1
      Dmitry Safonov 提交于
      As kernel expect to see only one of such mappings, any further operations
      on the VMA-copy may be unexpected by the kernel.  Maybe it's being on the
      safe side, but there doesn't seem to be any expected use-case for this, so
      restrict it now.
      
      Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com
      Fixes: commit e346b381 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd544fd1
  6. 11 11月, 2020 1 次提交
  7. 07 11月, 2020 1 次提交
  8. 03 10月, 2020 1 次提交
  9. 24 8月, 2020 1 次提交
  10. 08 8月, 2020 1 次提交
  11. 16 6月, 2020 1 次提交
  12. 11 6月, 2020 1 次提交
  13. 10 6月, 2020 1 次提交
  14. 14 5月, 2020 1 次提交
    • M
      aio: fix async fsync creds · 530f32fc
      Miklos Szeredi 提交于
      Avi Kivity reports that on fuse filesystems running in a user namespace
      asyncronous fsync fails with EOVERFLOW.
      
      The reason is that f_ops->fsync() is called with the creds of the kthread
      performing aio work instead of the creds of the process originally
      submitting IOCB_CMD_FSYNC.
      
      Fuse sends the creds of the caller in the request header and it needs to
      translate the uid and gid into the server's user namespace.  Since the
      kthread is running in init_user_ns, the translation will fail and the
      operation returns an error.
      
      It can be argued that fsync doesn't actually need any creds, but just
      zeroing out those fields in the header (as with requests that currently
      don't take creds) is a backward compatibility risk.
      
      Instead of working around this issue in fuse, solve the core of the problem
      by calling the filesystem with the proper creds.
      Reported-by: NAvi Kivity <avi@scylladb.com>
      Tested-by: NGiuseppe Scrivano <gscrivan@redhat.com>
      Fixes: c9582eb0 ("fuse: Fail all requests with invalid uids or gids")
      Cc: stable@vger.kernel.org  # 4.18+
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      530f32fc
  15. 04 2月, 2020 1 次提交
    • J
      aio: prevent potential eventfd recursion on poll · 01d7a356
      Jens Axboe 提交于
      If we have nested or circular eventfd wakeups, then we can deadlock if
      we run them inline from our poll waitqueue wakeup handler. It's also
      possible to have very long chains of notifications, to the extent where
      we could risk blowing the stack.
      
      Check the eventfd recursion count before calling eventfd_signal(). If
      it's non-zero, then punt the signaling to async context. This is always
      safe, as it takes us out-of-line in terms of stack and locking context.
      
      Cc: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      01d7a356
  16. 15 11月, 2019 1 次提交
  17. 22 10月, 2019 1 次提交
    • G
      aio: Fix io_pgetevents() struct __compat_aio_sigset layout · 97eba80f
      Guillem Jover 提交于
      This type is used to pass the sigset_t from userland to the kernel,
      but it was using the kernel native pointer type for the member
      representing the compat userland pointer to the userland sigset_t.
      
      This messes up the layout, and makes the kernel eat up both the
      userland pointer and the size members into the kernel pointer, and
      then reads garbage into the kernel sigsetsize. Which makes the sigset_t
      size consistency check fail, and consequently the syscall always
      returns -EINVAL.
      
      This breaks both libaio and strace on 32-bit userland running on 64-bit
      kernels. And there are apparently no users in the wild of the current
      broken layout (at least according to codesearch.debian.org and a brief
      check over github.com search). So it looks safe to fix this directly
      in the kernel, instead of either letting userland deal with this
      permanently with the additional overhead or trying to make the syscall
      infer what layout userland used, even though this is also being worked
      around in libaio to temporarily cope with kernels that have not yet
      been fixed.
      
      We use a proper compat_uptr_t instead of a compat_sigset_t pointer.
      
      Fixes: 7a074e96 ("aio: implement io_pgetevents")
      Signed-off-by: NGuillem Jover <guillem@hadrons.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      97eba80f
  18. 19 7月, 2019 1 次提交
  19. 17 7月, 2019 1 次提交
  20. 29 6月, 2019 1 次提交
    • O
      signal: remove the wrong signal_pending() check in restore_user_sigmask() · 97abc889
      Oleg Nesterov 提交于
      This is the minimal fix for stable, I'll send cleanups later.
      
      Commit 854a6ed5 ("signal: Add restore_user_sigmask()") introduced
      the visible change which breaks user-space: a signal temporary unblocked
      by set_user_sigmask() can be delivered even if the caller returns
      success or timeout.
      
      Change restore_user_sigmask() to accept the additional "interrupted"
      argument which should be used instead of signal_pending() check, and
      update the callers.
      
      Eric said:
      
      : For clarity.  I don't think this is required by posix, or fundamentally to
      : remove the races in select.  It is what linux has always done and we have
      : applications who care so I agree this fix is needed.
      :
      : Further in any case where the semantic change that this patch rolls back
      : (aka where allowing a signal to be delivered and the select like call to
      : complete) would be advantage we can do as well if not better by using
      : signalfd.
      :
      : Michael is there any chance we can get this guarantee of the linux
      : implementation of pselect and friends clearly documented.  The guarantee
      : that if the system call completes successfully we are guaranteed that no
      : signal that is unblocked by using sigmask will be delivered?
      
      Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
      Fixes: 854a6ed5 ("signal: Add restore_user_sigmask()")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NEric Wong <e@80x24.org>
      Tested-by: NEric Wong <e@80x24.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97abc889
  21. 01 6月, 2019 1 次提交
  22. 26 5月, 2019 2 次提交
    • D
      vfs: Convert aio to use the new mount API · 52db59df
      David Howells 提交于
      Convert the aio filesystem to the new internal mount API as the old
      one will be obsoleted and removed.  This allows greater flexibility in
      communication of mount parameters between userspace, the VFS and the
      filesystem.
      
      See Documentation/filesystems/mount_api.txt for more information.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Benjamin LaHaise <bcrl@kvack.org>
      cc: linux-aio@kvack.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      52db59df
    • A
      mount_pseudo(): drop 'name' argument, switch to d_make_root() · 1f58bb18
      Al Viro 提交于
      Once upon a time we used to set ->d_name of e.g. pipefs root
      so that d_path() on pipes would work.  These days it's
      completely pointless - dentries of pipes are not even connected
      to pipefs root.  However, mount_pseudo() had set the root
      dentry name (passed as the second argument) and callers
      kept inventing names to pass to it.  Including those that
      didn't *have* any non-root dentries to start with...
      
      All of that had been pointless for about 8 years now; it's
      time to get rid of that cargo-culting...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1f58bb18
  23. 05 4月, 2019 1 次提交
  24. 04 4月, 2019 1 次提交
  25. 18 3月, 2019 9 次提交
    • A
      aio: move sanity checks and request allocation to io_submit_one() · 7316b49c
      Al Viro 提交于
      makes for somewhat cleaner control flow in __io_submit_one()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7316b49c
    • A
      deal with get_reqs_available() in aio_get_req() itself · fa0ca2ae
      Al Viro 提交于
      simplifies the caller
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fa0ca2ae
    • A
      aio: move dropping ->ki_eventfd into iocb_destroy() · 74259703
      Al Viro 提交于
      no reason to duplicate that...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      74259703
    • A
      make aio_read()/aio_write() return int · 958c13ce
      Al Viro 提交于
      that ssize_t is a rudiment of earlier calling conventions; it's been
      used only to pass 0 and -E... since last autumn.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      958c13ce
    • A
      Fix aio_poll() races · af5c72b1
      Al Viro 提交于
      aio_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for aio poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's aio_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
      suck, and unfortunately aio is not an exception...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      af5c72b1
    • A
      aio: store event at final iocb_put() · 2bb874c0
      Al Viro 提交于
      Instead of having aio_complete() set ->ki_res.{res,res2}, do that
      explicitly in its callers, drop the reference (as aio_complete()
      used to do) and delay the rest until the final iocb_put().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2bb874c0
    • A
      aio: keep io_event in aio_kiocb · a9339b78
      Al Viro 提交于
      We want to separate forming the resulting io_event from putting it
      into the ring buffer.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a9339b78
    • A
      aio: fold lookup_kiocb() into its sole caller · 833f4154
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      833f4154
    • L
      pin iocb through aio. · b53119f1
      Linus Torvalds 提交于
      aio_poll() is not the only case that needs file pinned; worse, while
      aio_read()/aio_write() can live without pinning iocb itself, the
      proof is rather brittle and can easily break on later changes.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b53119f1
  26. 05 3月, 2019 1 次提交
    • L
      aio: simplify - and fix - fget/fput for io_submit() · 84c4e1f8
      Linus Torvalds 提交于
      Al Viro root-caused a race where the IOCB_CMD_POLL handling of
      fget/fput() could cause us to access the file pointer after it had
      already been freed:
      
       "In more details - normally IOCB_CMD_POLL handling looks so:
      
         1) io_submit(2) allocates aio_kiocb instance and passes it to
            aio_poll()
      
         2) aio_poll() resolves the descriptor to struct file by req->file =
            fget(iocb->aio_fildes)
      
         3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
            aio_kiocb to 2 (bumps by 1, that is).
      
         4) aio_poll() calls vfs_poll(). After sanity checks (basically,
            "poll_wait() had been called and only once") it locks the queue.
            That's what the extra reference to iocb had been for - we know we
            can safely access it.
      
         5) With queue locked, we check if ->woken has already been set to
            true (by aio_poll_wake()) and, if it had been, we unlock the
            queue, drop a reference to aio_kiocb and bugger off - at that
            point it's a responsibility to aio_poll_wake() and the stuff
            called/scheduled by it. That code will drop the reference to file
            in req->file, along with the other reference to our aio_kiocb.
      
         6) otherwise, we see whether we need to wait. If we do, we unlock the
            queue, drop one reference to aio_kiocb and go away - eventual
            wakeup (or cancel) will deal with the reference to file and with
            the other reference to aio_kiocb
      
         7) otherwise we remove ourselves from waitqueue (still under the
            queue lock), so that wakeup won't get us. No async activity will
            be happening, so we can safely drop req->file and iocb ourselves.
      
        If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
        won't get freed under us, so we can do all the checks and locking
        safely. And we don't touch ->file if we detect that case.
      
        However, vfs_poll() most certainly *does* touch the file it had been
        given. So wakeup coming while we are still in ->poll() might end up
        doing fput() on that file. That case is not too rare, and usually we
        are saved by the still present reference from descriptor table - that
        fput() is not the final one.
      
        But if another thread closes that descriptor right after our fget()
        and wakeup does happen before ->poll() returns, we are in trouble -
        final fput() done while we are in the middle of a method:
      
      Al also wrote a patch to take an extra reference to the file descriptor
      to fix this, but I instead suggested we just streamline the whole file
      pointer handling by submit_io() so that the generic aio submission code
      simply keeps the file pointer around until the aio has completed.
      
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c4e1f8
  27. 22 2月, 2019 1 次提交
    • B
      aio: Fix locking in aio_poll() · d3d6a18d
      Bart Van Assche 提交于
      wake_up_locked() may but does not have to be called with interrupts
      disabled. Since the fuse filesystem calls wake_up_locked() without
      disabling interrupts aio_poll_wake() may be called with interrupts
      enabled. Since the kioctx.ctx_lock may be acquired from IRQ context,
      all code that acquires that lock from thread context must disable
      interrupts. Hence change the spin_trylock() call in aio_poll_wake()
      into a spin_trylock_irqsave() call. This patch fixes the following
      lockdep complaint:
      
      =====================================================
      WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      5.0.0-rc4-next-20190131 #23 Not tainted
      -----------------------------------------------------
      syz-executor2/13779 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      0000000098ac1230 (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:329 [inline]
      0000000098ac1230 (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1772 [inline]
      0000000098ac1230 (&fiq->waitq){+.+.}, at: __io_submit_one fs/aio.c:1875 [inline]
      0000000098ac1230 (&fiq->waitq){+.+.}, at: io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
      
      and this task is already holding:
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:354 [inline]
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1771 [inline]
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one fs/aio.c:1875 [inline]
      000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: io_submit_one+0xeb6/0x1cf0 fs/aio.c:1908
      which would create a new lock dependency:
       (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
      
      but this new dependency connects a SOFTIRQ-irq-safe lock:
       (&(&ctx->ctx_lock)->rlock){..-.}
      
      ... which became SOFTIRQ-irq-safe at:
        lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
        __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
        _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
        spin_lock_irq include/linux/spinlock.h:354 [inline]
        free_ioctx_users+0x2d/0x4a0 fs/aio.c:610
        percpu_ref_put_many include/linux/percpu-refcount.h:285 [inline]
        percpu_ref_put include/linux/percpu-refcount.h:301 [inline]
        percpu_ref_call_confirm_rcu lib/percpu-refcount.c:123 [inline]
        percpu_ref_switch_to_atomic_rcu+0x3e7/0x520 lib/percpu-refcount.c:158
        __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
        rcu_do_batch kernel/rcu/tree.c:2486 [inline]
        invoke_rcu_callbacks kernel/rcu/tree.c:2799 [inline]
        rcu_core+0x928/0x1390 kernel/rcu/tree.c:2780
        __do_softirq+0x266/0x95a kernel/softirq.c:292
        run_ksoftirqd kernel/softirq.c:654 [inline]
        run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
        smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
        kthread+0x357/0x430 kernel/kthread.c:247
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      to a SOFTIRQ-irq-unsafe lock:
       (&fiq->waitq){+.+.}
      
      ... which became SOFTIRQ-irq-unsafe at:
      ...
        lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
        spin_lock include/linux/spinlock.h:329 [inline]
        flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
        fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
        fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
        fuse_send_init fs/fuse/inode.c:989 [inline]
        fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
        mount_nodev+0x68/0x110 fs/super.c:1392
        fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
        legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
        vfs_get_tree+0x123/0x450 fs/super.c:1481
        do_new_mount fs/namespace.c:2610 [inline]
        do_mount+0x1436/0x2c40 fs/namespace.c:2932
        ksys_mount+0xdb/0x150 fs/namespace.c:3148
        __do_sys_mount fs/namespace.c:3162 [inline]
        __se_sys_mount fs/namespace.c:3159 [inline]
        __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&fiq->waitq);
                                     local_irq_disable();
                                     lock(&(&ctx->ctx_lock)->rlock);
                                     lock(&fiq->waitq);
        <Interrupt>
          lock(&(&ctx->ctx_lock)->rlock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor2/13779:
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:354 [inline]
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1771 [inline]
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one fs/aio.c:1875 [inline]
       #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: io_submit_one+0xeb6/0x1cf0 fs/aio.c:1908
      
      the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
      -> (&(&ctx->ctx_lock)->rlock){..-.} {
         IN-SOFTIRQ-W at:
                          lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                          __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
                          _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
                          spin_lock_irq include/linux/spinlock.h:354 [inline]
                          free_ioctx_users+0x2d/0x4a0 fs/aio.c:610
                          percpu_ref_put_many include/linux/percpu-refcount.h:285 [inline]
                          percpu_ref_put include/linux/percpu-refcount.h:301 [inline]
                          percpu_ref_call_confirm_rcu lib/percpu-refcount.c:123 [inline]
                          percpu_ref_switch_to_atomic_rcu+0x3e7/0x520 lib/percpu-refcount.c:158
                          __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
                          rcu_do_batch kernel/rcu/tree.c:2486 [inline]
                          invoke_rcu_callbacks kernel/rcu/tree.c:2799 [inline]
                          rcu_core+0x928/0x1390 kernel/rcu/tree.c:2780
                          __do_softirq+0x266/0x95a kernel/softirq.c:292
                          run_ksoftirqd kernel/softirq.c:654 [inline]
                          run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
                          smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
                          kthread+0x357/0x430 kernel/kthread.c:247
                          ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
         INITIAL USE at:
                         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                         __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
                         _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
                         spin_lock_irq include/linux/spinlock.h:354 [inline]
                         __do_sys_io_cancel fs/aio.c:2052 [inline]
                         __se_sys_io_cancel fs/aio.c:2035 [inline]
                         __x64_sys_io_cancel+0xd5/0x5a0 fs/aio.c:2035
                         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                         entry_SYSCALL_64_after_hwframe+0x49/0xbe
       }
       ... key      at: [<ffffffff8a574140>] __key.52370+0x0/0x40
       ... acquired at:
         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
         spin_lock include/linux/spinlock.h:329 [inline]
         aio_poll fs/aio.c:1772 [inline]
         __io_submit_one fs/aio.c:1875 [inline]
         io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
         __do_sys_io_submit fs/aio.c:1953 [inline]
         __se_sys_io_submit fs/aio.c:1923 [inline]
         __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      the dependencies between the lock to be acquired
       and SOFTIRQ-irq-unsafe lock:
      -> (&fiq->waitq){+.+.} {
         HARDIRQ-ON-W at:
                          lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                          __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                          _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                          spin_lock include/linux/spinlock.h:329 [inline]
                          flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                          fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                          fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                          fuse_send_init fs/fuse/inode.c:989 [inline]
                          fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                          mount_nodev+0x68/0x110 fs/super.c:1392
                          fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                          legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                          vfs_get_tree+0x123/0x450 fs/super.c:1481
                          do_new_mount fs/namespace.c:2610 [inline]
                          do_mount+0x1436/0x2c40 fs/namespace.c:2932
                          ksys_mount+0xdb/0x150 fs/namespace.c:3148
                          __do_sys_mount fs/namespace.c:3162 [inline]
                          __se_sys_mount fs/namespace.c:3159 [inline]
                          __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                          do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                          entry_SYSCALL_64_after_hwframe+0x49/0xbe
         SOFTIRQ-ON-W at:
                          lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                          __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                          _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                          spin_lock include/linux/spinlock.h:329 [inline]
                          flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                          fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                          fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                          fuse_send_init fs/fuse/inode.c:989 [inline]
                          fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                          mount_nodev+0x68/0x110 fs/super.c:1392
                          fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                          legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                          vfs_get_tree+0x123/0x450 fs/super.c:1481
                          do_new_mount fs/namespace.c:2610 [inline]
                          do_mount+0x1436/0x2c40 fs/namespace.c:2932
                          ksys_mount+0xdb/0x150 fs/namespace.c:3148
                          __do_sys_mount fs/namespace.c:3162 [inline]
                          __se_sys_mount fs/namespace.c:3159 [inline]
                          __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                          do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                          entry_SYSCALL_64_after_hwframe+0x49/0xbe
         INITIAL USE at:
                         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                         spin_lock include/linux/spinlock.h:329 [inline]
                         flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                         fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                         fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                         fuse_send_init fs/fuse/inode.c:989 [inline]
                         fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                         mount_nodev+0x68/0x110 fs/super.c:1392
                         fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                         legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                         vfs_get_tree+0x123/0x450 fs/super.c:1481
                         do_new_mount fs/namespace.c:2610 [inline]
                         do_mount+0x1436/0x2c40 fs/namespace.c:2932
                         ksys_mount+0xdb/0x150 fs/namespace.c:3148
                         __do_sys_mount fs/namespace.c:3162 [inline]
                         __se_sys_mount fs/namespace.c:3159 [inline]
                         __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                         entry_SYSCALL_64_after_hwframe+0x49/0xbe
       }
       ... key      at: [<ffffffff8a60dec0>] __key.43450+0x0/0x40
       ... acquired at:
         lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
         spin_lock include/linux/spinlock.h:329 [inline]
         aio_poll fs/aio.c:1772 [inline]
         __io_submit_one fs/aio.c:1875 [inline]
         io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
         __do_sys_io_submit fs/aio.c:1953 [inline]
         __se_sys_io_submit fs/aio.c:1923 [inline]
         __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      stack backtrace:
      CPU: 0 PID: 13779 Comm: syz-executor2 Not tainted 5.0.0-rc4-next-20190131 #23
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_bad_irq_dependency kernel/locking/lockdep.c:1573 [inline]
       check_usage.cold+0x60f/0x940 kernel/locking/lockdep.c:1605
       check_irq_usage kernel/locking/lockdep.c:1650 [inline]
       check_prev_add_irq kernel/locking/lockdep_states.h:8 [inline]
       check_prev_add kernel/locking/lockdep.c:1860 [inline]
       check_prevs_add kernel/locking/lockdep.c:1968 [inline]
       validate_chain kernel/locking/lockdep.c:2339 [inline]
       __lock_acquire+0x1f12/0x4790 kernel/locking/lockdep.c:3320
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
       spin_lock include/linux/spinlock.h:329 [inline]
       aio_poll fs/aio.c:1772 [inline]
       __io_submit_one fs/aio.c:1875 [inline]
       io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
       __do_sys_io_submit fs/aio.c:1953 [inline]
       __se_sys_io_submit fs/aio.c:1923 [inline]
       __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: <stable@vger.kernel.org>
      Fixes: e8693bcf ("aio: allow direct aio poll comletions for keyed wakeups") # v4.19
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      [ bvanassche: added a comment ]
      Reluctantly-Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d3d6a18d
  28. 07 2月, 2019 1 次提交
    • A
      y2038: syscalls: rename y2038 compat syscalls · 8dabe724
      Arnd Bergmann 提交于
      A lot of system calls that pass a time_t somewhere have an implementation
      using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have
      been reworked so that this implementation can now be used on 32-bit
      architectures as well.
      
      The missing step is to redefine them using the regular SYSCALL_DEFINEx()
      to get them out of the compat namespace and make it possible to build them
      on 32-bit architectures.
      
      Any system call that ends in 'time' gets a '32' suffix on its name for
      that version, while the others get a '_time32' suffix, to distinguish
      them from the normal version, which takes a 64-bit time argument in the
      future.
      
      In this step, only 64-bit architectures are changed, doing this rename
      first lets us avoid touching the 32-bit architectures twice.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      8dabe724
  29. 06 2月, 2019 1 次提交
  30. 29 12月, 2018 1 次提交
  31. 18 12月, 2018 1 次提交