1. 28 1月, 2014 4 次提交
    • T
      NFS: Fix races in nfs_revalidate_mapping · 17dfeb91
      Trond Myklebust 提交于
      Commit d529ef83 (NFS: fix the handling
      of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces
      a potential race, since it doesn't test the value of nfsi->cache_validity
      and set the bitlock in nfsi->flags atomically.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      17dfeb91
    • J
      sunrpc: turn warn_gssd() log message into a dprintk() · 0ea9de0e
      Jeff Layton 提交于
      The original printk() made sense when the GSSAPI codepaths were called
      only when sec=krb5* was explicitly requested. Now however, in many cases
      the nfs client will try to acquire GSSAPI credentials by default, even
      when it's not requested.
      
      Since we don't have a great mechanism to distinguish between the two
      cases, just turn the pr_warn into a dprintk instead. With this change we
      can also get rid of the ratelimiting.
      
      We do need to keep the EXPORT_SYMBOL(gssd_running) in place since
      auth_gss.ko needs it and sunrpc.ko provides it. We can however,
      eliminate the gssd_running call in the nfs code since that's a bit of a
      layering violation.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0ea9de0e
    • J
      NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping · d529ef83
      Jeff Layton 提交于
      There is a possible race in how the nfs_invalidate_mapping function is
      handled.  Currently, we go and invalidate the pages in the file and then
      clear NFS_INO_INVALID_DATA.
      
      The problem is that it's possible for a stale page to creep into the
      mapping after the page was invalidated (i.e., via readahead). If another
      writer comes along and sets the flag after that happens but before
      invalidate_inode_pages2 returns then we could clear the flag
      without the cache having been properly invalidated.
      
      So, we must clear the flag first and then invalidate the pages. Doing
      this however, opens another race:
      
      It's possible to have two concurrent read() calls that end up in
      nfs_revalidate_mapping at the same time. The first one clears the
      NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping.
      
      Just before calling that though, the other task races in, checks the
      flag and finds it cleared. At that point, it trusts that the mapping is
      good and gets the lock on the page, allowing the read() to be satisfied
      from the cache even though the data is no longer valid.
      
      These effects are easily manifested by running diotest3 from the LTP
      test suite on NFS. That program does a series of DIO writes and buffered
      reads. The operations are serialized and page-aligned but the existing
      code fails the test since it occasionally allows a read to come out of
      the cache incorrectly. While mixing direct and buffered I/O isn't
      recommended, I believe it's possible to hit this in other ways that just
      use buffered I/O, though that situation is much harder to reproduce.
      
      The problem is that the checking/clearing of that flag and the
      invalidation of the mapping really need to be atomic. Fix this by
      serializing concurrent invalidations with a bitlock.
      
      At the same time, we also need to allow other places that check
      NFS_INO_INVALID_DATA to check whether we might be in the middle of
      invalidating the file, so fix up a couple of places that do that
      to look for the new NFS_INO_INVALIDATING flag.
      
      Doing this requires us to be careful not to set the bitlock
      unnecessarily, so this code only does that if it believes it will
      be doing an invalidation.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d529ef83
    • M
      nfs: handle servers that support only ALLOW ACE type. · 7dd7d959
      Malahal Naineni 提交于
      Currently we support ACLs if the NFS server file system supports both
      ALLOW and DENY ACE types. This patch makes the Linux client work with
      ACLs even if the server supports only 'ALLOW' ACE type.
      Signed-off-by: NMalahal Naineni <malahal@us.ibm.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      7dd7d959
  2. 23 1月, 2014 1 次提交
    • B
      pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done · ed7e5423
      Boaz Harrosh 提交于
      An NFS4ERR_RECALLCONFLICT is returned by server from a GET_LAYOUT
      only when a Server Sent a RECALL do to that GET_LAYOUT, or
      the RECALL and GET_LAYOUT crossed on the wire.
      In any way this means we want to wait at most until in-flight IO
      is finished and the RECALL can be satisfied.
      
      So a proper wait here is more like 1/10 of a second, not 15 seconds
      like we have now. In case of a server bug we delay exponentially
      longer on each retry.
      
      Current code totally craps out performance of very large files on
      most pnfs-objects layouts, because of how the map changes when the
      file has grown into the next raid group.
      
      [Stable: This will patch back to 3.9. If there are earlier still
       maintained trees, please tell me I'll send a patch]
      
      CC: Stable Tree <stable@vger.kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ed7e5423
  3. 22 1月, 2014 1 次提交
    • W
      pnfs: fix BUG in filelayout_recover_commit_reqs · 471252cd
      Weston Andros Adamson 提交于
      cond_resched_lock(cinfo->lock) is called everywhere else while holding
      the cinfo->lock spinlock.  Not holding this lock while calling
      transfer_commit_list in filelayout_recover_commit_reqs causes the BUG
      below.
      
      It's true that we can't hold this lock while calling pnfs_put_lseg,
      because that might try to lock the inode lock - which might be the
      same lock as cinfo->lock.
      
      To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command
      that crosses a stripe boundary and is not page aligned, such as:
      
       dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct
      
      BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161
      in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1
      2 locks held by kworker/0:1/27:
       #0:  (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
       #1:  ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5
      CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ #21
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
      Workqueue: events nfs_direct_write_schedule_work [nfs]
       0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130  ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40  ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000
      Call Trace:
       [<ffffffff81491256>] dump_stack+0x4d/0x66
       [<ffffffff8105f103>] __might_sleep+0x100/0x105
       [<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files]
       [<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files]
       [<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs]
       [<ffffffff810705df>] ? mark_lock+0x1df/0x224
       [<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4
       [<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs]
       [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
       [<ffffffff81050258>] process_one_work+0x1f6/0x3a5
       [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5
       [<ffffffff8105187e>] worker_thread+0x149/0x1f5
       [<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d
       [<ffffffff81056d74>] kthread+0xd2/0xda
       [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61
       [<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      471252cd
  4. 21 1月, 2014 1 次提交
  5. 19 1月, 2014 1 次提交
  6. 18 1月, 2014 1 次提交
  7. 14 1月, 2014 10 次提交
  8. 06 1月, 2014 3 次提交
  9. 07 12月, 2013 2 次提交
  10. 05 12月, 2013 2 次提交
    • H
      nfs: fix do_div() warning by instead using sector_div() · 3873d064
      Helge Deller 提交于
      When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
      shown below.  Fix this warning by instead using sector_div() which is provided
      by the kernel.h header file.
      
      fs/nfs/blocklayout/extents.c: In function ‘normalize’:
      include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
      nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
      fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
      include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
       extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3873d064
    • T
      NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery · f22e5edd
      Trond Myklebust 提交于
      Andy Adamson reports:
      
      The state manager is recovering expired state and recovery OPENs are being
      processed. If kswapd is pruning inodes at the same time, a deadlock can occur
      when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
      resultant layoutreturn gets an error that the state mangager is to handle,
      causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
      
      At the same time an open is waiting for the inode deletion to complete in
      __wait_on_freeing_inode.
      
      If the open is either the open called by the state manager, or an open from
      the same open owner that is holding the NFSv4 sequence id which causes the
      OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
      then the state is deadlocked with kswapd.
      
      The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
      We already know that layouts are dropped on all server reboots, and that
      it has to be coded to deal with the "forgetful client model" that doesn't
      send layoutreturns.
      Reported-by: NAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.comSigned-off-by: NTrond Myklebust <Trond.Myklebust@primarydata.com>
      f22e5edd
  11. 21 11月, 2013 3 次提交
    • T
      NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED · 69794ad7
      Trond Myklebust 提交于
      Also ensure that we zero out the stateid mode when exiting
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      69794ad7
    • T
      NFSv4: Update list of irrecoverable errors on DELEGRETURN · c97cf606
      Trond Myklebust 提交于
      If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID
      then there is no recovery possible. Just quit without returning an error.
      
      Also, note that the client must not assume that the NFSv4 lease has been
      renewed when it sees an error on DELEGRETURN.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      c97cf606
    • A
      NFSv4 wait on recovery for async session errors · 4a82fd7c
      Andy Adamson 提交于
      When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session
      draining is off, but DELEGRETURN can still get a session error.
      The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and
      the DELEGRETURN done then restarts the RPC task in the prepare state.
      With the state manager still processing the NFS4CLNT_DELEGRETURN flag with
      session draining off, these DELEGRETURNs will cycle with errors filling up the
      session slots.
      
      This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the
      NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the
      state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task
      as seen in this kernel thread dump:
      
      kernel: 4.12.32.53-ma D 0000000000000000     0  3393      2 0x00000000
      kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140
      kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001
      kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058
      kernel: Call Trace:
      kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240
      kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
      kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc]
      kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90
      kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
      kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90
      kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50
      kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
      kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs]
      kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs]
      kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs]
      kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs]
      kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs]
      kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs]
      kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30
      kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs]
      kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs]
      kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs]
      kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0
      kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0
      kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      The state manager can not therefore process the DELEGRETURN session errors.
      Change the async handler to wait for recovery on session errors.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      4a82fd7c
  12. 20 11月, 2013 2 次提交
  13. 16 11月, 2013 2 次提交
  14. 15 11月, 2013 1 次提交
  15. 14 11月, 2013 1 次提交
    • J
      nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once · 6d769f1e
      Jeff Layton 提交于
      Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
      NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
      There is no guarantee that doing so will work however, so we can end up
      retrying the call in an infinite loop.
      
      Worse yet, we create the new client using rpc_clone_client_set_auth,
      which creates the new client as a child of the old one. Thus, we can end
      up with a *very* long lineage of rpc_clnts. When we go to put all of the
      references to them, we can end up with a long call chain that can smash
      the stack as each rpc_free_client() call can recurse back into itself.
      
      This patch fixes this by simply ensuring that the SETCLIENTID call will
      only be retried in this situation if the last attempt did not use
      RPC_AUTH_UNIX.
      
      Note too that with this change, we don't need the (i > 2) check in the
      -EACCES case since we now have a more reliable test as to whether we
      should reattempt.
      
      Cc: stable@vger.kernel.org # v3.10+
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6d769f1e
  16. 05 11月, 2013 4 次提交
  17. 02 11月, 2013 1 次提交