1. 12 8月, 2017 1 次提交
  2. 11 8月, 2017 4 次提交
  3. 10 8月, 2017 1 次提交
  4. 09 8月, 2017 1 次提交
  5. 06 8月, 2017 14 次提交
  6. 05 8月, 2017 2 次提交
  7. 03 8月, 2017 4 次提交
    • A
      fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio · 61c12b49
      Ashish Samant 提交于
      Commit 8fba54ae ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes
      the ITER_BVEC page deadlock for direct io in fuse by checking in
      fuse_direct_io(), whether the page is a bvec page or not, before locking
      it.  However, this check is missed when the "async_dio" mount option is
      enabled.  In this case, set_page_dirty_lock() is called from the req->end
      callback in request_end(), when the fuse thread is returning from userspace
      to respond to the read request.  This will cause the same deadlock because
      the bvec condition is not checked in this path.
      
      Here is the stack of the deadlocked thread, while returning from userspace:
      
      [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds.
      [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
      this message.
      [13706.658788] glusterfs       D ffffffff816c80f0     0  3006      1
      0x00000080
      [13706.658797]  ffff8800d6713a58 0000000000000086 ffff8800d9ad7000
      ffff8800d9ad5400
      [13706.658799]  ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0
      7fffffffffffffff
      [13706.658801]  0000000000000002 ffffffff816c80f0 ffff8800d6713a78
      ffffffff816c790e
      [13706.658803] Call Trace:
      [13706.658809]  [<ffffffff816c80f0>] ? bit_wait_io_timeout+0x80/0x80
      [13706.658811]  [<ffffffff816c790e>] schedule+0x3e/0x90
      [13706.658813]  [<ffffffff816ca7e5>] schedule_timeout+0x1b5/0x210
      [13706.658816]  [<ffffffff81073ffb>] ? gup_pud_range+0x1db/0x1f0
      [13706.658817]  [<ffffffff810668fe>] ? kvm_clock_read+0x1e/0x20
      [13706.658819]  [<ffffffff81066909>] ? kvm_clock_get_cycles+0x9/0x10
      [13706.658822]  [<ffffffff810f5792>] ? ktime_get+0x52/0xc0
      [13706.658824]  [<ffffffff816c6f04>] io_schedule_timeout+0xa4/0x110
      [13706.658826]  [<ffffffff816c8126>] bit_wait_io+0x36/0x50
      [13706.658828]  [<ffffffff816c7d06>] __wait_on_bit_lock+0x76/0xb0
      [13706.658831]  [<ffffffffa0545636>] ? lock_request+0x46/0x70 [fuse]
      [13706.658834]  [<ffffffff8118800a>] __lock_page+0xaa/0xb0
      [13706.658836]  [<ffffffff810c8500>] ? wake_atomic_t_function+0x40/0x40
      [13706.658838]  [<ffffffff81194d08>] set_page_dirty_lock+0x58/0x60
      [13706.658841]  [<ffffffffa054d968>] fuse_release_user_pages+0x58/0x70 [fuse]
      [13706.658844]  [<ffffffffa0551430>] ? fuse_aio_complete+0x190/0x190 [fuse]
      [13706.658847]  [<ffffffffa0551459>] fuse_aio_complete_req+0x29/0x90 [fuse]
      [13706.658849]  [<ffffffffa05471e9>] request_end+0xd9/0x190 [fuse]
      [13706.658852]  [<ffffffffa0549126>] fuse_dev_do_write+0x336/0x490 [fuse]
      [13706.658854]  [<ffffffffa054963e>] fuse_dev_write+0x6e/0xa0 [fuse]
      [13706.658857]  [<ffffffff812a9ef3>] ? security_file_permission+0x23/0x90
      [13706.658859]  [<ffffffff81205300>] do_iter_readv_writev+0x60/0x90
      [13706.658862]  [<ffffffffa05495d0>] ? fuse_dev_splice_write+0x350/0x350
      [fuse]
      [13706.658863]  [<ffffffff812062a1>] do_readv_writev+0x171/0x1f0
      [13706.658866]  [<ffffffff810b3d00>] ? try_to_wake_up+0x210/0x210
      [13706.658868]  [<ffffffff81206361>] vfs_writev+0x41/0x50
      [13706.658870]  [<ffffffff81206496>] SyS_writev+0x56/0xf0
      [13706.658872]  [<ffffffff810257a1>] ? syscall_trace_leave+0xf1/0x160
      [13706.658874]  [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71
      
      Fix this by making should_dirty a fuse_io_priv parameter that can be
      checked in fuse_aio_complete_req().
      Reported-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NAshish Samant <ashish.samant@oracle.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      61c12b49
    • J
      ocfs2: don't clear SGID when inheriting ACLs · 19ec8e48
      Jan Kara 提交于
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0').  However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
      into ocfs2_iop_set_acl().  That way the function will not be called when
      inheriting ACLs which is what we want as it prevents SGID bit clearing
      and the mode has been properly set by posix_acl_create() anyway.  Also
      posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
      mode itself.
      
      Fixes: 07393101 ("posix_acl: Clear SGID bit when setting file permissions")
      Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19ec8e48
    • M
      userfaultfd: non-cooperative: flush event_wqh at release time · 5a18b64e
      Mike Rapoport 提交于
      There may still be threads waiting on event_wqh at the time the
      userfault file descriptor is closed.  Flush the events wait-queue to
      prevent waiting threads from hanging.
      
      Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
      Fixes: 9cd75c3c ("userfaultfd: non-cooperative: add ability to report
      non-PF events from uffd descriptor")
      Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18b64e
    • M
      userfaultfd_zeropage: return -ENOSPC in case mm has gone · 9d95aa4b
      Mike Rapoport 提交于
      In the non-cooperative userfaultfd case, the process exit may race with
      outstanding mcopy_atomic called by the uffd monitor.  Returning -ENOSPC
      instead of -EINVAL when mm is already gone will allow uffd monitor to
      distinguish this case from other error conditions.
      
      Unfortunately I overlooked userfaultfd_zeropage when updating
      userfaultd_copy().
      
      Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
      Fixes: 96333187 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
      Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d95aa4b
  8. 02 8月, 2017 2 次提交
  9. 31 7月, 2017 6 次提交
  10. 29 7月, 2017 1 次提交
    • B
      NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter · b7dbcc0e
      Benjamin Coddington 提交于
      nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
      same region protected by the wait_queue's lock after checking for a
      notification from CB_NOTIFY_LOCK callback.  However, after releasing that
      lock, a wakeup for that task may race in before the call to
      freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
      freezable_schedule_timeout_interruptible() will set the state back to
      TASK_INTERRUPTIBLE before the task will sleep.  The result is that the task
      will sleep for the entire duration of the timeout.
      
      Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
      freezable_schedule_timout() instead.
      
      Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b7dbcc0e
  11. 27 7月, 2017 3 次提交
    • N
      NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43
      NeilBrown 提交于
      posix_fallocate() will allocate space in an NFS file by considering
      the last byte of every 4K block.  If it is before EOF, it will read
      the byte and if it is zero, a zero is written out.  If it is after EOF,
      the zero is unconditionally written.
      
      For the blocks beyond EOF, if NFS believes its cache is valid, it will
      expand these writes to write full pages, and then will merge the pages.
      This results if (typically) 1MB writes.  If NFS believes its cache is
      not valid (particularly if NFS_INO_INVALID_DATA or
      NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
      send the individual 1-byte writes. This results in (typically) 256 times
      as many RPC requests, and can be substantially slower.
      
      Currently nfs_revalidate_mapping() is only used when reading a file or
      mmapping a file, as these are times when the content needs to be
      up-to-date.  Writes don't generally need the cache to be up-to-date, but
      writes beyond EOF can benefit, particularly in the posix_fallocate()
      case.
      
      So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
      i.e. when there is a gap between the end of the file and the start of
      the write.  If the cache is thought to be out of date (as happens after
      taking a file lock), this will cause a GETATTR, and the two flags
      mentioned above will be cleared.  With this, posix_fallocate() on a
      newly locked file does not generate excessive tiny writes.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6ba80d43
    • N
      NFS: invalidate file size when taking a lock. · 442ce049
      NeilBrown 提交于
      Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
      for writing"), NFS would revalidate, or invalidate, the file size when
      taking a lock.  Since that commit it only invalidates the file content.
      
      If the file size is changed on the server while wait for the lock, the
      client will have an incorrect understanding of the file size and could
      corrupt data.  This particularly happens when writing beyond the
      (supposed) end of file and can be easily be demonstrated with
      posix_fallocate().
      
      If an application opens an empty file, waits for a write lock, and then
      calls posix_fallocate(), glibc will determine that the underlying
      filesystem doesn't support fallocate (assuming version 4.1 or earlier)
      and will write out a '0' byte at the end of each 4K page in the region
      being fallocated that is after the end of the file.
      NFS will (usually) detect that these writes are beyond EOF and will
      expand them to cover the whole page, and then will merge the pages.
      Consequently, NFS will write out large blocks of zeroes beyond where it
      thought EOF was.  If EOF had moved, the pre-existing part of the file
      will be over-written.  Locking should have protected against this,
      but it doesn't.
      
      This patch restores the use of nfs_zap_caches() which invalidated the
      cached attributes.  When posix_fallocate() asks for the file size, the
      request will go to the server and get a correct answer.
      
      cc: stable@vger.kernel.org (v4.8+)
      Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      442ce049
    • A
      NFS: Use raw NFS access mask in nfs4_opendata_access() · 1e6f2095
      Anna Schumaker 提交于
      Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache") changed how the access results are stored after an
      access() call.  An NFS v4 OPEN might have access bits returned with the
      opendata, so we should use the NFS4_ACCESS values when determining the
      return value in nfs4_opendata_access().
      
      Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache")
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Tested-by: NTakashi Iwai <tiwai@suse.de>
      1e6f2095
  12. 26 7月, 2017 1 次提交