1. 08 12月, 2019 7 次提交
    • L
      pipe: don't use 'pipe_wait() for basic pipe IO · 85190d15
      Linus Torvalds 提交于
      pipe_wait() may be simple, but since it relies on the pipe lock, it
      means that we have to do the wakeup while holding the lock.  That's
      unfortunate, because the very first thing the waked entity will want to
      do is to get the pipe lock for itself.
      
      So get rid of the pipe_wait() usage by simply releasing the pipe lock,
      doing the wakeup (if required) and then using wait_event_interruptible()
      to wait on the right condition instead.
      
      wait_event_interruptible() handles races on its own by comparing the
      wakeup condition before and after adding itself to the wait queue, so
      you can use an optimistic unlocked condition for it.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85190d15
    • L
      pipe: remove 'waiting_writers' merging logic · a28c8b9d
      Linus Torvalds 提交于
      This code is ancient, and goes back to when we only had a single page
      for the pipe buffers.  The exact history is hidden in the mists of time
      (ie "before git", and in fact predates the BK repository too).
      
      At that long-ago point in time, it actually helped to try to merge big
      back-and-forth pipe reads and writes, and not limit pipe reads to the
      single pipe buffer in length just because that was all we had at a time.
      
      However, since then we've expanded the pipe buffers to multiple pages,
      and this logic really doesn't seem to make sense.  And a lot of it is
      somewhat questionable (ie "hmm, the user asked for a non-blocking read,
      but we see that there's a writer pending, so let's wait anyway to get
      the extra data that the writer will have").
      
      But more importantly, it makes the "go to sleep" logic much less
      obvious, and considering the wakeup issues we've had, I want to make for
      less of those kinds of things.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a28c8b9d
    • L
      pipe: fix and clarify pipe read wakeup logic · f467a6a6
      Linus Torvalds 提交于
      This is the read side version of the previous commit: it simplifies the
      logic to only wake up waiting writers when necessary, and makes sure to
      use a synchronous wakeup.  This time not so much for GNU make jobserver
      reasons (that pipe never fills up), but simply to get the writer going
      quickly again.
      
      A bit less verbose commentary this time, if only because I assume that
      the write side commentary isn't going to be ignored if you touch this
      code.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f467a6a6
    • L
      pipe: fix and clarify pipe write wakeup logic · 1b6b26ae
      Linus Torvalds 提交于
      The pipe rework ends up having been extra painful, partly becaused of
      actual bugs with ordering and caching of the pipe state, but also
      because of subtle performance issues.
      
      In particular, the pipe rework caused the kernel build to inexplicably
      slow down.
      
      The reason turns out to be that the GNU make jobserver (which limits the
      parallelism of the build) uses a pipe to implement a "token" system: a
      parallel submake will read a character from the pipe to get the job
      token before starting a new job, and will write a character back to the
      pipe when it is done.  The overall job limit is thus easily controlled
      by just writing the appropriate number of initial token characters into
      the pipe.
      
      But to work well, that really means that the old behavior of write
      wakeups being synchronous (WF_SYNC) is very important - when the pipe
      writer wakes up a reader, we want the reader to actually get scheduled
      immediately.  Otherwise you lose the parallelism of the build.
      
      The pipe rework lost that synchronous wakeup on write, and we had
      clearly all forgotten the reasons and rules for it.
      
      This rewrites the pipe write wakeup logic to do the required Wsync
      wakeups, but also clarifies the logic and avoids extraneous wakeups.
      
      It also ends up addign a number of comments about what oit does and why,
      so that we hopefully don't end up forgetting about this next time we
      change this code.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b6b26ae
    • L
      pipe: fix poll/select race introduced by the pipe rework · ad910e36
      Linus Torvalds 提交于
      The kernel wait queues have a basic rule to them: you add yourself to
      the wait-queue first, and then you check the things that you're going to
      wait on.  That avoids the races with the event you're waiting for.
      
      The same goes for poll/select logic: the "poll_wait()" goes first, and
      then you check the things you're polling for.
      
      Of course, if you use locking, the ordering doesn't matter since the
      lock will serialize with anything that changes the state you're looking
      at. That's not the case here, though.
      
      So move the poll_wait() first in pipe_poll(), before you start looking
      at the pipe state.
      
      Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad910e36
    • P
      nfsd: depend on CRYPTO_MD5 for legacy client tracking · 38a2204f
      Patrick Steinhardt 提交于
      The legacy client tracking infrastructure of nfsd makes use of MD5 to
      derive a client's recovery directory name. As the nfsd module doesn't
      declare any dependency on CRYPTO_MD5, though, it may fail to allocate
      the hash if the kernel was compiled without it. As a result, generation
      of client recovery directories will fail with the following error:
      
          NFSD: unable to generate recoverydir name
      
      The explicit dependency on CRYPTO_MD5 was removed as redundant back in
      6aaa67b5 (NFSD: Remove redundant "select" clauses in fs/Kconfig
      2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
      This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
      df486a25 (NFS: Fix the selection of security flavours in Kconfig) at
      a later point.
      
      Fix the issue by adding back an explicit dependency on CRYPTO_MD5.
      
      Fixes: df486a25 (NFS: Fix the selection of security flavours in Kconfig)
      Signed-off-by: NPatrick Steinhardt <ps@pks.im>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      38a2204f
    • O
      NFSD fixing possible null pointer derefering in copy offload · 18f428d4
      Olga Kornievskaia 提交于
      Static checker revealed possible error path leading to possible
      NULL pointer dereferencing.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: e0639dc5: ("NFSD introduce async copy feature")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      18f428d4
  2. 07 12月, 2019 2 次提交
  3. 06 12月, 2019 2 次提交
    • D
      pipe: Fix missing mask update after pipe_wait() · 8f868d68
      David Howells 提交于
      Fix pipe_write() to not cache the ring index mask and max_usage as their
      values are invalidated by calling pipe_wait() because the latter
      function drops the pipe lock, thereby allowing F_SETPIPE_SZ change them.
      Without this, pipe_write() may subsequently miscalculate the array
      indices and pipe fullness, leading to an oops like the following:
      
        BUG: KASAN: slab-out-of-bounds in pipe_write+0xc25/0xe10 fs/pipe.c:481
        Write of size 8 at addr ffff8880771167a8 by task syz-executor.3/7987
        ...
        CPU: 1 PID: 7987 Comm: syz-executor.3 Not tainted 5.4.0-rc2-syzkaller #0
        ...
        Call Trace:
          pipe_write+0xc25/0xe10 fs/pipe.c:481
          call_write_iter include/linux/fs.h:1895 [inline]
          new_sync_write+0x3fd/0x7e0 fs/read_write.c:483
          __vfs_write+0x94/0x110 fs/read_write.c:496
          vfs_write+0x18a/0x520 fs/read_write.c:558
          ksys_write+0x105/0x220 fs/read_write.c:611
          __do_sys_write fs/read_write.c:623 [inline]
          __se_sys_write fs/read_write.c:620 [inline]
          __x64_sys_write+0x6e/0xb0 fs/read_write.c:620
          do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is not a problem for pipe_read() as the mask is recalculated on
      each pass of the loop, after pipe_wait() has been called.
      
      Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
      Reported-by: syzbot+838eb0878ffd51f27c41@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      [ Changed it to use a temporary variable 'mask' to avoid long lines -Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f868d68
    • D
      pipe: Remove assertion from pipe_poll() · 8c7b8c34
      David Howells 提交于
      An assertion check was added to pipe_poll() to make sure that the ring
      occupancy isn't seen to overflow the ring size.  However, since no locks
      are held when the three values are read, it is possible for F_SETPIPE_SZ
      to intervene and muck up the calculation, thereby causing the oops.
      
      Fix this by simply removing the assertion and accepting that the
      calculation might be approximate.
      
      Note that the previous code also had a similar issue, though there was
      no assertion check, since the occupancy counter and the ring size were
      not read with a lock held, so it's possible that the poll check might
      have malfunctioned then too.
      
      Also wake up all the waiters so that they can reissue their checks if
      there was a competing read or write.
      
      Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
      Reported-by: syzbot+d37abaade33a934f16f2@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Eric Biggers <ebiggers@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c7b8c34
  4. 05 12月, 2019 15 次提交
  5. 04 12月, 2019 2 次提交
    • M
      orangefs: posix open permission checking... · f9bbb682
      Mike Marshall 提交于
      Orangefs has no open, and orangefs checks file permissions
      on each file access. Posix requires that file permissions
      be checked on open and nowhere else. Orangefs-through-the-kernel
      needs to seem posix compliant.
      
      The VFS opens files, even if the filesystem provides no
      method. We can see if a file was successfully opened for
      read and or for write by looking at file->f_mode.
      
      When writes are flowing from the page cache, file is no
      longer available. We can trust the VFS to have checked
      file->f_mode before writing to the page cache.
      
      The mode of a file might change between when it is opened
      and IO commences, or it might be created with an arbitrary mode.
      
      We'll make sure we don't hit EACCES during the IO stage by
      using UID 0. Some of the time we have access without changing
      to UID 0 - how to check?
      Signed-off-by: NMike Marshall <hubcap@omnibond.com>
      f9bbb682
    • J
      io_uring: handle connect -EINPROGRESS like -EAGAIN · 87f80d62
      Jens Axboe 提交于
      Right now we return it to userspace, which means the application has
      to poll for the socket to be writeable. Let's just treat it like
      -EAGAIN and have io_uring handle it internally, this makes it much
      easier to use.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      87f80d62
  6. 03 12月, 2019 9 次提交
  7. 02 12月, 2019 3 次提交
    • J
      io_uring: use current task creds instead of allocating a new one · 0b8c0ec7
      Jens Axboe 提交于
      syzbot reports:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
        kthread+0x361/0x430 kernel/kthread.c:255
        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace f2e1a4307fbe2245 ]---
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is caused by slab fault injection triggering a failure in
      prepare_creds(). We don't actually need to create a copy of the creds
      as we're not modifying it, we just need a reference on the current task
      creds. This avoids the failure case as well, and propagates the const
      throughout the stack.
      
      Fixes: 181e448d ("io_uring: async workers should inherit the user creds")
      Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b8c0ec7
    • M
      userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK · 3c1c24d9
      Mike Rapoport 提交于
      A while ago Andy noticed
      (http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
      that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
      security implications.
      
      As the first step of the solution the following patch limits the availably
      of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.
      
      The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.
      
      Yet, if there are other users of non-cooperative userfaultfd that run
      without CAP_SYS_PTRACE, they would be broken :(
      
      Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
      descriptor table from the read() implementation of uffd, which may have
      security implications for unprivileged use of the userfaultfd.
      
      Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
      CAP_SYS_PTRACE.
      
      Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Nosh Minwalla <nosh@google.com>
      Cc: Pavel Emelyanov <ovzxemul@gmail.com>
      Cc: Tim Murray <timmurray@google.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c1c24d9
    • A
      fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register() · 9d4678eb
      Andrea Arcangeli 提交于
      If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
      need to be cleared.  Currently setting UFFDIO_REGISTER_MODE_WP returns
      -EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
      is applied.
      
      Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: NWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d4678eb
新手
引导
客服 返回
顶部