1. 08 10月, 2016 2 次提交
  2. 06 10月, 2016 1 次提交
  3. 27 9月, 2016 7 次提交
    • J
      nfsd4: setclientid_confirm with unmatched verifier should fail · 7d22fc11
      J. Bruce Fields 提交于
      A setclientid_confirm with (clientid, verifier) both matching an
      existing confirmed record is assumed to be a replay, but if the verifier
      doesn't match, it shouldn't be.
      
      This would be a very rare case, except that clients following
      https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
      failure.
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7d22fc11
    • J
      nfsd: randomize SETCLIENTID reply to help distinguish servers · ebd7c72c
      J. Bruce Fields 提交于
      NFSv4.1 has built-in trunking support that allows a client to determine
      whether two connections to two different IP addresses are actually to
      the same server.  NFSv4.0 does not, but RFC 7931 attempts to provide
      clients a means to do this, basically by performing a SETCLIENTID to one
      address and confirming it with a SETCLIENTID_CONFIRM to the other.
      
      Linux clients since 05f4c350 "NFS: Discover NFSv4 server trunking
      when mounting" implement a variation on this suggestion.  It is possible
      that other clients do too.
      
      This depends on the clientid and verifier not being accepted by an
      unrelated server.  Since both are 64-bit values, that would be very
      unlikely if they were random numbers.  But they aren't:
      
      knfsd generates the 64-bit clientid by concatenating the 32-bit boot
      time (in seconds) and a counter.  This makes collisions between
      clientids generated by the same server extremely unlikely.  But
      collisions are very likely between clientids generated by servers that
      boot at the same time, and it's quite common for multiple servers to
      boot at the same time.  The verifier is a concatenation of the
      SETCLIENTID time (in seconds) and a counter, so again collisions between
      different servers are likely if multiple SETCLIENTIDs are done at the
      same time, which is a common case.
      
      Therefore recent NFSv4.0 clients may decide two different servers are
      really the same, and mount a filesystem from the wrong server.
      
      Fortunately the Linux client, since 55b9df93 "nfsv4/v4.1: Verify the
      client owner id during trunking detection", only does this when given
      the non-default "migration" mount option.
      
      The fault is really with RFC 7931, and needs a client fix, but in the
      meantime we can mitigate the chance of these collisions by randomizing
      the starting value of the counters used to generate clientids and
      verifiers.
      Reported-by: NFrank Sorenson <fsorenso@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ebd7c72c
    • J
      nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies · 19e4c347
      Jeff Layton 提交于
      If we are using v4.1+, then we can send notification when contended
      locks become free. Inform the client of that fact.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      19e4c347
    • J
      nfsd: add a LRU list for blocked locks · 7919d0a2
      Jeff Layton 提交于
      It's possible for a client to call in on a lock that is blocked for a
      long time, but discontinue polling for it. A malicious client could
      even set a lock on a file, and then spam the server with failing lock
      requests from different lockowners that pile up in a DoS attack.
      
      Add the blocked lock structures to a per-net namespace LRU when hashing
      them, and timestamp them. If the lock request is not revisited after a
      lease period, we'll drop it under the assumption that the client is no
      longer interested.
      
      This also gives us a mechanism to clean up these objects at server
      shutdown time as well.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7919d0a2
    • J
      nfsd: have nfsd4_lock use blocking locks for v4.1+ locks · 76d348fa
      Jeff Layton 提交于
      Create a new per-lockowner+per-inode structure that contains a
      file_lock. Have nfsd4_lock add this structure to the lockowner's list
      prior to setting the lock. Then call the vfs and request a blocking lock
      (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
      back, then we dequeue the block structure and free it. When the next
      lock request comes in, we'll look for an existing block for the same
      filehandle and dequeue and reuse it if there is one.
      
      When the lock comes free (a'la an lm_notify call), we dequeue it
      from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
      inform the client that it should retry the lock request.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      76d348fa
    • J
      nfsd: plumb in a CB_NOTIFY_LOCK operation · a188620e
      Jeff Layton 提交于
      Add the encoding/decoding for CB_NOTIFY_LOCK operations.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a188620e
    • V
      NFSD: fix corruption in notifier registration · 1eca45f8
      Vasily Averin 提交于
      By design notifier can be registered once only, however nfsd registers
      the same inetaddr notifiers per net-namespace.  When this happen it
      corrupts list of notifiers, as result some notifiers can be not called
      on proper event, traverse on list can be cycled forever, and second
      unregister can access already freed memory.
      
      Cc: stable@vger.kernel.org
      fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1eca45f8
  4. 23 9月, 2016 1 次提交
  5. 17 9月, 2016 2 次提交
    • J
      nfsd: eliminate cb_minorversion field · 89dfdc96
      Jeff Layton 提交于
      We already have that info in the client pointer. No need to pass around
      a copy.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      89dfdc96
    • J
      nfsd: don't set a FL_LAYOUT lease for flexfiles layouts · 1983a66f
      Jeff Layton 提交于
      We currently can hit a deadlock (of sorts) when trying to use flexfiles
      layouts with XFS. XFS will call break_layout when something wants to
      write to the file. In the case of the (super-simple) flexfiles layout
      driver in knfsd, the MDS and DS are the same machine.
      
      The client can get a layout and then issue a v3 write to do its I/O. XFS
      will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
      be issued to the client. The client however can't return the layout
      until the v3 WRITE completes, but XFS won't allow the write to proceed
      until the layout is returned.
      
      Christoph says:
      
          XFS only cares about block-like layouts where the client has direct
          access to the file blocks.  I'd need to look how to propagate the
          flag into break_layout, but in principle we don't need to do any
          recalls on truncate ever for file and flexfile layouts.
      
      If we're never going to recall the layout, then we don't even need to
      set the lease at all. Just skip doing so on flexfiles layouts by
      adding a new flag to struct nfsd4_layout_ops and skipping the lease
      setting and removal when that flag is true.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1983a66f
  6. 10 9月, 2016 4 次提交
    • E
      fscrypto: require write access to mount to set encryption policy · ba63f23d
      Eric Biggers 提交于
      Since setting an encryption policy requires writing metadata to the
      filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
      Otherwise, a user could cause a write to a frozen or readonly
      filesystem.  This was handled correctly by f2fs but not by ext4.  Make
      fscrypt_process_policy() handle it rather than relying on the filesystem
      to get it right.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ba63f23d
    • E
      fscrypto: only allow setting encryption policy on directories · 002ced4b
      Eric Biggers 提交于
      The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
      policy on nondirectory files.  This was unintentional, and in the case
      of nonempty regular files did not behave as expected because existing
      data was not actually encrypted by the ioctl.
      
      In the case of ext4, the user could also trigger filesystem errors in
      ->empty_dir(), e.g. due to mismatched "directory" checksums when the
      kernel incorrectly tried to interpret a regular file as a directory.
      
      This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
      kernels v4.6 and later.  It appears that older kernels only permitted
      directories and that the check was accidentally lost during the
      refactoring to share the file encryption code between ext4 and f2fs.
      
      This patch restores the !S_ISDIR() check that was present in older
      kernels.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      002ced4b
    • E
      fscrypto: add authorization check for setting encryption policy · 163ae1c6
      Eric Biggers 提交于
      On an ext4 or f2fs filesystem with file encryption supported, a user
      could set an encryption policy on any empty directory(*) to which they
      had readonly access.  This is obviously problematic, since such a
      directory might be owned by another user and the new encryption policy
      would prevent that other user from creating files in their own directory
      (for example).
      
      Fix this by requiring inode_owner_or_capable() permission to set an
      encryption policy.  This means that either the caller must own the file,
      or the caller must have the capability CAP_FOWNER.
      
      (*) Or also on any regular file, for f2fs v4.6 and later and ext4
          v4.8-rc1 and later; a separate bug fix is coming for that.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      163ae1c6
    • D
      mm: fix show_smap() for zone_device-pmd ranges · ca120cf6
      Dan Williams 提交于
      Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
      currently results in the following VM_BUG_ONs:
      
       kernel BUG at mm/huge_memory.c:1105!
       task: ffff88045f16b140 task.stack: ffff88045be14000
       RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
       [..]
       Call Trace:
        [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
        [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
        [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
        [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
        [<ffffffff81307656>] show_smap+0xa6/0x2b0
      
       kernel BUG at fs/proc/task_mmu.c:585!
       RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
       Call Trace:
        [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
        [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
        [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
        [<ffffffff81307696>] show_smap+0xa6/0x2b0
      
      These locations are sanity checking page flags that must be set for an
      anonymous transparent huge page, but are not set for the zone_device
      pages associated with dax mappings.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca120cf6
  7. 06 9月, 2016 2 次提交
    • W
      btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress · ce129655
      Wang Xiaoguang 提交于
      In btrfs_async_reclaim_metadata_space(), we use ticket's address to
      determine whether asynchronous metadata reclaim work is making progress.
      
      	ticket = list_first_entry(&space_info->tickets,
      				  struct reserve_ticket, list);
      	if (last_ticket == ticket) {
      		flush_state++;
      	} else {
      		last_ticket = ticket;
      		flush_state = FLUSH_DELAYED_ITEMS_NR;
      		if (commit_cycles)
      			commit_cycles--;
      	}
      
      But indeed it's wrong, we should not rely on local variable's address to
      do this check, because addresses may be same. In my test environment, I
      dd one 168MB file in a 256MB fs, found that for this file, every time
      wait_reserve_ticket() called, local variable ticket's address is same,
      
      For above codes, assume a previous ticket's address is addrA, last_ticket
      is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
      wake up it, then another ticket is added, but with the same address addrA,
      now last_ticket will be same to current ticket, then current ticket's flush
      work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
      which may result in some enospc issues(I have seen this in my test machine).
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce129655
    • C
      Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns · cbd60aa7
      Chris Mason 提交于
      We use a btrfs_log_ctx structure to pass information into the
      tree log commit, and get error values out.  It gets added to a per
      log-transaction list which we walk when things go bad.
      
      Commit d1433deb added an optimization to skip waiting for the log
      commit, but didn't take root_log_ctx out of the list.  This
      patch makes sure we remove things before exiting.
      Signed-off-by: NChris Mason <clm@fb.com>
      Fixes: d1433deb
      cc: stable@vger.kernel.org # 3.15+
      cbd60aa7
  8. 05 9月, 2016 3 次提交
  9. 04 9月, 2016 1 次提交
    • L
      devpts: return NULL pts 'priv' entry for non-devpts nodes · 3e423945
      Linus Torvalds 提交于
      In commit 8ead9dd5 ("devpts: more pty driver interface cleanups") I
      made devpts_get_priv() just return the dentry->fs_data directly.  And
      because I thought it wouldn't happen, I added a warning if you ever saw
      a pts node that wasn't on devpts.
      
      And no, that warning never triggered under any actual real use, but you
      can trigger it by creating nonsensical pts nodes by hand.
      
      So just revert the warning, and make devpts_get_priv() return NULL for
      that case like it used to.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org # 4.6+
      Cc: Eric W Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e423945
  10. 01 9月, 2016 17 次提交