1. 11 9月, 2014 17 次提交
  2. 09 9月, 2014 2 次提交
    • J
      nfs: revert "nfs4: queue free_lock_state job submission to nfsiod" · 0c0e0d3c
      Jeff Layton 提交于
      This reverts commit 49a4bda2.
      
      Christoph reported an oops due to the above commit:
      
      generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
      SMP
      [ 2187.042899] Modules linked in:
      [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
      [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [ 2187.044287] Workqueue: nfsiod free_lock_state_work
      [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
      [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>]  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
      [ 2187.044287] RSP: 0018:ffff88007a4efd58  EFLAGS: 00010296
      [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
      [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
      [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
      [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
      [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
      [ 2187.044287] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      [ 2187.044287] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
      [ 2187.044287] Stack:
      [ 2187.044287]  ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
      [ 2187.044287]  000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
      [ 2187.044287]  0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
      [ 2187.044287] Call Trace:
      [ 2187.044287]  [<ffffffff810cc877>] process_one_work+0x1c7/0x490
      [ 2187.044287]  [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490
      [ 2187.044287]  [<ffffffff810cd569>] worker_thread+0x119/0x4f0
      [ 2187.044287]  [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10
      [ 2187.044287]  [<ffffffff810cd450>] ? init_pwq+0x190/0x190
      [ 2187.044287]  [<ffffffff810d3c6f>] kthread+0xdf/0x100
      [ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
      [ 2187.044287]  [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0
      [ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
      [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
      [ 2187.044287] RIP  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
      [ 2187.044287]  RSP <ffff88007a4efd58>
      [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---
      
      The original reason for this patch was because the fl_release_private
      operation couldn't sleep. With commit ed9814d8 (locks: defer freeing
      locks in locks_delete_lock until after i_lock has been dropped), this is
      no longer a problem so we can revert this patch.
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0c0e0d3c
    • C
      nfs: fix kernel warning when removing proc entry · 21e81002
      Cong Wang 提交于
      I saw the following kernel warning:
      
      [ 1852.321222] ------------[ cut here ]------------
      [ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b()
      [ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes'
      [ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
      [ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 1852.354992] Workqueue: netns cleanup_net
      [ 1852.358701]  0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18
      [ 1852.366474]  ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238
      [ 1852.373507]  ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68
      [ 1852.380224] Call Trace:
      [ 1852.381976]  [<ffffffff819c03e9>] dump_stack+0x4d/0x66
      [ 1852.385495]  [<ffffffff810744ee>] warn_slowpath_common+0x7a/0x93
      [ 1852.389869]  [<ffffffff811e0e6e>] ? remove_proc_entry+0x154/0x16b
      [ 1852.393987]  [<ffffffff8107457b>] warn_slowpath_fmt+0x4c/0x4e
      [ 1852.397999]  [<ffffffff811e0e6e>] remove_proc_entry+0x154/0x16b
      [ 1852.402034]  [<ffffffff8129c73d>] nfs_fs_proc_net_exit+0x53/0x56
      [ 1852.406136]  [<ffffffff812a103b>] nfs_net_exit+0x12/0x1d
      [ 1852.409774]  [<ffffffff81785bc9>] ops_exit_list+0x44/0x55
      [ 1852.413529]  [<ffffffff81786389>] cleanup_net+0xee/0x182
      [ 1852.417198]  [<ffffffff81088c9e>] process_one_work+0x209/0x40d
      [ 1852.502320]  [<ffffffff81088bf7>] ? process_one_work+0x162/0x40d
      [ 1852.587629]  [<ffffffff810890c1>] worker_thread+0x1f0/0x2c7
      [ 1852.673291]  [<ffffffff81088ed1>] ? process_scheduled_works+0x2f/0x2f
      [ 1852.759470]  [<ffffffff8108e079>] kthread+0xc9/0xd1
      [ 1852.843099]  [<ffffffff8109427f>] ? finish_task_switch+0x3a/0xce
      [ 1852.926518]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
      [ 1853.008565]  [<ffffffff819cbeac>] ret_from_fork+0x7c/0xb0
      [ 1853.076477]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
      [ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---
      
      It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
      while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().
      
      Fixes: commit 65b38851 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes)
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Dan Aloni <dan@kernelim.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      [Trond: replace uses of remove_proc_entry() with remove_proc_subtree()
      as suggested by Al Viro]
      Cc: stable@vger.kernel.org # 3.4.x : 65b38851: NFS: Fix /proc/fs/nfsfs/servers
      Cc: stable@vger.kernel.org # 3.4.x
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      21e81002
  3. 27 8月, 2014 3 次提交
  4. 23 8月, 2014 8 次提交
  5. 08 8月, 2014 1 次提交
    • J
      dcache: d_obtain_alias callers don't all want DISCONNECTED · 1a0a397e
      J. Bruce Fields 提交于
      There are a few d_obtain_alias callers that are using it to get the
      root of a filesystem which may already have an alias somewhere else.
      
      This is not the same as the filehandle-lookup case, and none of them
      actually need DCACHE_DISCONNECTED set.
      
      It isn't really a serious problem, but it would really be clearer if we
      reserved DCACHE_DISCONNECTED for those cases where it's actually needed.
      
      In the btrfs case this was causing a spurious printk from
      nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
      dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
      manually in 3a0dfa6a "Btrfs: unset DCACHE_DISCONNECTED when mounting
      default subvol", and this replaces that workaround.
      
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1a0a397e
  6. 05 8月, 2014 3 次提交
    • S
      nfs: reject changes to resvport and sharecache during remount · 71a6ec8a
      Scott Mayhew 提交于
      Commit c8e47028 made it possible to change resvport/noresvport and
      sharecache/nosharecache via a remount operation, neither of which should be
      allowed.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: c8e47028 (nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data)
      Cc: stable@vger.kernel.org # 3.16+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      71a6ec8a
    • K
      NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error · 5b53dc88
      Kinglong Mee 提交于
      Fix Commit 60ea6812 (NFS: Migration support for RELEASE_LOCKOWNER)
      If getting expired error, client will enter a infinite loop as,
      
      client                            server
         RELEASE_LOCKOWNER(old clid) ----->
                      <--- expired error
         RENEW(old clid)             ----->
                      <--- expired error
         SETCLIENTID                 ----->
                      <--- a new clid
         SETCLIENTID_CONFIRM (new clid) -->
                      <--- ok
         RELEASE_LOCKOWNER(old clid) ----->
                      <--- expired error
         RENEW(new clid)             ----->
                      <-- ok
         RELEASE_LOCKOWNER(old clid) ----->
                      <--- expired error
         RENEW(new clid)             ----->
                      <-- ok
                      ... ...
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      [Trond: replace call to nfs4_async_handle_error() with
       nfs4_schedule_lease_recovery()]
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5b53dc88
    • E
      NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes · 65b38851
      Eric W. Biederman 提交于
      The usage of pid_ns->child_reaper->nsproxy->net_ns in
      nfs_server_list_open and nfs_client_list_open is not safe.
      
      /proc for a pid namespace can remain mounted after the all of the
      process in that pid namespace have exited.  There are also times
      before the initial process in a pid namespace has started or after the
      initial process in a pid namespace has exited where
      pid_ns->child_reaper can be NULL or stale.  Making the idiom
      pid_ns->child_reaper->nsproxy a double whammy of problems.
      
      Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and
      /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and
      /proc/net/nfsfs/volumes and add a symlink from the original location,
      and to use seq_open_net as it has been designed.
      
      Cc: stable@vger.kernel.org
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      65b38851
  7. 04 8月, 2014 6 次提交
    • N
      NFS: fix two problems in lookup_revalidate in RCU-walk · 50d77739
      NeilBrown 提交于
      1/ rcu_dereference isn't correct: that field isn't
         RCU protected.   It could potentially change at any time
         so ACCESS_ONCE might be justified.
      
         changes to ->d_parent are protected by ->d_seq.  However
         that isn't always checked after ->d_revalidate is called,
         so it is safest to keep the double-check that ->d_parent
         hasn't changed at the end of these functions.
      
      2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten.
         So 'parent' was not the parent of 'dentry'.
         This fails safe is the context is that dentry->d_inode is
         NULL, and the result of parent->d_inode being NULL is
         that ECHILD is returned, which is always safe.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      50d77739
    • N
      NFS: allow lockless access to access_cache · f682a398
      NeilBrown 提交于
      The access cache is used during RCU-walk path lookups, so it is best
      to avoid locking if possible as taking a lock kills concurrency.
      
      The rbtree is not rcu-safe and cannot easily be made so.
      Instead we simply check the last (i.e. most recent) entry on the LRU
      list.  If this doesn't match, then we return -ECHILD and retry in
      lock/refcount mode.
      
      This requires freeing the nfs_access_entry struct with rcu, and
      requires using rcu access primatives when adding entries to the lru, and
      when examining the last entry.
      
      Calling put_rpccred before kfree_rcu looks a bit odd, but as
      put_rpccred already provides rcu protection, we know that the cred will
      not actually be freed until the next grace period, so any concurrent
      access will be safe.
      
      This patch provides about 5% performance improvement on a stat-heavy
      synthetic work load with 4 threads on a 2-core CPU.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f682a398
    • N
      NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU · 1fa1e384
      NeilBrown 提交于
      It fails with -ECHILD rather than make an RPC call.
      
      This allows nfs_lookup_revalidate to call it in RCU-walk mode.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1fa1e384
    • N
      NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU · 912a108d
      NeilBrown 提交于
      This requires nfs_check_verifier to take an rcu_walk flag, and requires
      an rcu version of nfs_revalidate_inode which returns -ECHILD rather
      than making an RPC call.
      
      With this, nfs_lookup_revalidate can call nfs_neg_need_reval in
      RCU-walk mode.
      
      We can also move the LOOKUP_RCU check past the nfs_check_verifier()
      call in nfs_lookup_revalidate.
      
      If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from
      doing a full check, they return a status indicating that a revalidation
      is required.  As this revalidation will not be possible in RCU_WALK
      mode, -ECHILD will ultimately be returned, which is the desired result.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      912a108d
    • N
      NFS: support RCU_WALK in nfs_permission() · f3324a2a
      NeilBrown 提交于
      nfs_permission makes two calls which are not always safe in RCU_WALK,
      rpc_lookup_cred and nfs_do_access.
      
      The second can easily be made rcu-safe by aborting with -ECHILD before
      making the RPC call.
      
      The former can be made rcu-safe by calling rpc_lookup_cred_nonblock()
      instead.
      As this will almost always succeed, we use it even when RCU_WALK
      isn't being used as it still saves some spinlocks in a common case.
      We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock()
      fails and MAY_NOT_BLOCK isn't set.
      
      This optimisation (always trying rpc_lookup_cred_nonblock()) is
      particularly important when a security module is active.
      In that case inode_permission() may return -ECHILD from
      security_inode_permission() even though ->permission() succeeded in
      RCU_WALK mode.
      This leads to may_lookup() retrying inode_permission after performing
      unlazy_walk().  The spinlock that rpc_lookup_cred() takes is often
      more expensive than anything security_inode_permission() does, so that
      spinlock becomes the main bottleneck.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f3324a2a
    • N
      NFS: prepare for RCU-walk support but pushing tests later in code. · d51ac1a8
      NeilBrown 提交于
      nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission
      all need to understand and handle RCU-walk for NFS to gain the
      benefits of RCU-walk for cached information.
      
      Currently these functions all immediately return -ECHILD
      if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set.
      
      This patch pushes those tests later in the code so that we only abort
      immediately before we enter rcu-unsafe code.  As subsequent patches
      make that rcu-unsafe code rcu-safe, several of these new tests will
      disappear.
      
      With this patch there are several paths through the code which will no
      longer return -ECHILD during an RCU-walk.  However these are mostly
      error paths or other uninteresting cases.
      
      A noteworthy change in nfs_lookup_revalidate is that we don't take
      (or put) the reference to ->d_parent when LOOKUP_RCU is set.
      Rather we rcu_dereference ->d_parent, and check that ->d_inode
      is not NULL.  We also check that ->d_parent hasn't changed after
      all the tests.
      
      In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the
      path that only calls nfs_lookup_revalidate() as that function
      already performs the required test.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d51ac1a8