1. 13 11月, 2014 7 次提交
  2. 05 11月, 2014 3 次提交
  3. 20 10月, 2014 1 次提交
  4. 14 10月, 2014 1 次提交
  5. 13 10月, 2014 2 次提交
    • T
      NFS: Fix a bogus warning in nfs_generic_pgio · b8fb9c30
      Trond Myklebust 提交于
      It is OK for pageused == pagecount in the loop, as long as we don't add
      another entry to the *pages array. Move the test so that it only triggers
      in that case.
      Reported-by: NSteve Dickson <SteveD@redhat.com>
      Fixes: bba5c188 (nfs: disallow duplicate pages in pgio page vectors)
      Cc: Weston Andros Adamson <dros@primarydata.com>
      Cc: stable@vger.kernel.org # 3.16.x
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      b8fb9c30
    • T
      NFS: Fix an uninitialised pointer Oops in the writeback error path · 3caa0c6e
      Trond Myklebust 提交于
      SteveD reports the following Oops:
       RIP: 0010:[<ffffffffa053461d>]  [<ffffffffa053461d>] __put_nfs_open_context+0x1d/0x100 [nfs]
       RSP: 0018:ffff880fed687b90  EFLAGS: 00010286
       RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000006
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
       RBP: ffff880fed687bc0 R08: 0000000000000092 R09: 000000000000047a
       R10: 0000000000000000 R11: ffff880fed6878d6 R12: ffff880fed687d20
       R13: ffff880fed687d20 R14: 0000000000000070 R15: ffffea000aa33ec0
       FS:  00007fce290f0740(0000) GS:ffff8807ffc60000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000070 CR3: 00000007f2e79000 CR4: 00000000000007e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        0000000000000000 ffff880036c5e510 ffff880fed687d20 ffff880fed687d20
        ffff880036c5e200 ffffea000aa33ec0 ffff880fed687bd0 ffffffffa0534710
        ffff880fed687be8 ffffffffa053d5f0 ffff880036c5e200 ffff880fed687c08
       Call Trace:
        [<ffffffffa0534710>] put_nfs_open_context+0x10/0x20 [nfs]
        [<ffffffffa053d5f0>] nfs_pgio_data_destroy+0x20/0x40 [nfs]
        [<ffffffffa053d672>] nfs_pgio_error+0x22/0x40 [nfs]
        [<ffffffffa053d8f4>] nfs_generic_pgio+0x74/0x2e0 [nfs]
        [<ffffffffa06b18c3>] pnfs_generic_pg_writepages+0x63/0x210 [nfsv4]
        [<ffffffffa053d579>] nfs_pageio_doio+0x19/0x50 [nfs]
        [<ffffffffa053eb84>] nfs_pageio_complete+0x24/0x30 [nfs]
        [<ffffffffa053cb25>] nfs_direct_write_schedule_iovec+0x115/0x1f0 [nfs]
        [<ffffffffa053675f>] ? nfs_get_lock_context+0x4f/0x120 [nfs]
        [<ffffffffa053d252>] nfs_file_direct_write+0x262/0x420 [nfs]
        [<ffffffffa0532d91>] nfs_file_write+0x131/0x1d0 [nfs]
        [<ffffffffa0532c60>] ? nfs_need_sync_write.isra.17+0x40/0x40 [nfs]
        [<ffffffff812127b8>] do_io_submit+0x3b8/0x840
        [<ffffffff81212c50>] SyS_io_submit+0x10/0x20
        [<ffffffff81610f29>] system_call_fastpath+0x16/0x1b
      
      This is due to the calls to nfs_pgio_error() in nfs_generic_pgio(), which
      happen before the nfs_pgio_header's open context is referenced in
      nfs_pgio_rpcsetup().
      Reported-by: NSteve Dickson <SteveD@redhat.com>
      Cc: stable@vger.kernel.org # 3.16.x
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3caa0c6e
  6. 09 10月, 2014 4 次提交
  7. 01 10月, 2014 2 次提交
    • A
      NFSv4.1: Fix an NFSv4.1 state renewal regression · d1f456b0
      Andy Adamson 提交于
      Commit 2f60ea6b ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
      not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
      call, on the wire to renew the NFSv4.1 state if the flag was not set.
      
      The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
      (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
      sometimes does the following:
      
      In normal operation, the only way a future state renewal call is put on the
      wire is via a call to nfs4_schedule_state_renewal, which schedules a
      nfs4_renew_state workqueue task. nfs4_renew_state determines if the
      NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
      which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
      Then the nfs41_proc_async_sequence rpc_release function schedules
      another state remewal via nfs4_schedule_state_renewal.
      
      Without this change we can get into a state where an application stops
      accessing the NFSv4.1 share, state renewal calls stop due to the
      NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
      from this situation is with a clientid re-establishment, once the application
      resumes and the server has timed out the lease and so returns
      NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
      
      An example application:
      open, lock, write a file.
      
      sleep for 6 * lease (could be less)
      
      ulock, close.
      
      In the above example with NFSv4.1 delegations enabled, without this change,
      there are no OP_SEQUENCE state renewal calls during the sleep, and the
      clientid is recovered due to lease expiration on the close.
      
      This issue does not occur with NFSv4.1 delegations disabled, nor with
      NFSv4.0, with or without delegations enabled.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
      Fixes: 2f60ea6b (NFSv4: The NFSv4.0 client must send RENEW calls...)
      Cc: stable@vger.kernel.org # 3.2.x
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d1f456b0
    • A
      NFS: Implement SEEK · 1c6dcbe5
      Anna Schumaker 提交于
      The SEEK operation is used when an application makes an lseek call with
      either the SEEK_HOLE or SEEK_DATA flags set.  I fall back on
      nfs_file_llseek() if the server does not have SEEK support.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1c6dcbe5
  8. 29 9月, 2014 2 次提交
    • T
      NFSv4: fix open/lock state recovery error handling · df817ba3
      Trond Myklebust 提交于
      The current open/lock state recovery unfortunately does not handle errors
      such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
      just proceeds as if the state manager is finished recovering.
      This patch ensures that we loop back, handle higher priority errors
      and complete the open/lock state recovery.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      df817ba3
    • T
      NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails · a4339b7b
      Trond Myklebust 提交于
      If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
      CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
      a second time, then the client will currently take this to mean that it must
      declare all locks to be stale, and hence ineligible for reboot recovery.
      
      RFC3530 and RFC5661 both suggest that the client should instead rely on the
      server to respond to inelegible open share, lock and delegation reclaim
      requests with NFS4ERR_NO_GRACE in this situation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a4339b7b
  9. 26 9月, 2014 2 次提交
  10. 25 9月, 2014 7 次提交
    • N
      NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256
      NeilBrown 提交于
      Now that nfs_release_page() doesn't block indefinitely, other deadlock
      avoidance mechanisms aren't needed.
       - it doesn't hurt for kswapd to block occasionally.  If it doesn't
         want to block it would clear __GFP_WAIT.  The current_is_kswapd()
         was only added to avoid deadlocks and we have a new approach for
         that.
       - memory allocation in the SUNRPC layer can very rarely try to
         ->releasepage() a page it is trying to handle.  The deadlock
         is removed as nfs_release_page() doesn't block indefinitely.
      
      So we don't need to set PF_FSTRANS for sunrpc network operations any
      more.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1aff5256
    • N
      NFS: avoid waiting at all in nfs_release_page when congested. · 353db796
      NeilBrown 提交于
      If nfs_release_page() is called on a sequence of pages which are all
      in the same file which is blocked on COMMIT, each page could
      contribute a 1 second delay which could be come excessive.  I have
      seen delays of as much as 208 seconds.
      
      To keep the delay to one second, mark the bdi as write-congested
      if the commit didn't finished.  Once it does finish, the
      write-congested flag will be cleared by nfs_commit_release_pages().
      
      With this, the longest total delay in try_to_free_pages that I have
      seen is under 3 seconds.  With no waiting in nfs_release_page at all
      I have seen delays of nearly 1.5 seconds.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      353db796
    • N
      NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446
      NeilBrown 提交于
      Support for loop-back mounted NFS filesystems is useful when NFS is
      used to access shared storage in a high-availability cluster.
      
      If the node running the NFS server fails, some other node can mount the
      filesystem and start providing NFS service.  If that node already had
      the filesystem NFS mounted, it will now have it loop-back mounted.
      
      nfsd can suffer a deadlock when allocating memory and entering direct
      reclaim.
      While direct reclaim does not write to the NFS filesystem it can send
      and wait for a COMMIT through nfs_release_page().
      
      This patch modifies nfs_release_page() to wait a limited time for the
      commit to complete - one second.  If the commit doesn't complete
      in this time, nfs_release_page() will fail.  This means it might now
      fail in some cases where it wouldn't before.  These cases are only
      when 'gfp' includes '__GFP_WAIT'.
      
      nfs_release_page() is only called by try_to_release_page(), and that
      can only be called on an NFS page with required 'gfp' flags from
       - page_cache_pipe_buf_steal() in splice.c
       - shrink_page_list() in vmscan.c
       - invalidate_inode_pages2_range() in truncate.c
      
      The first two handle failure quite safely.  The last is only called
      after ->launder_page() has been called, and that will have waited
      for the commit to finish already.
      
      So aborting if the commit takes longer than 1 second is perfectly safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95905446
    • N
      NFS: don't use STABLE writes during writeback. · e87b4c7a
      NeilBrown 提交于
      commit b31268ac
        FS: Use stable writes when not doing a bulk flush
      
      was a bit heavy handed.
      The particular problem that lead to this patch was that
      small writes to an O_SYNC file we being written as UNSTABLE writes
      followed by a commit.
      This is appropriate for large writes (which require multiple NFS
      requests) but for small writes (single NFS request), using
      NFS_FILE_SYNC is more efficient.
      
      So that patch causes the code to select between the two methods
      depending on how many nfs requests get generated.
      
      Unfortunately this ends up applying to non O_SYNC writes as well.
      In particular if you memory-map a file and update random pages, then
      when they are eventually written out by writeback they will go as
      NFS_FILE_SYNC.  This is inefficient and slows down the application.
      
      So: only set FLUSH_COND_STABLE when wbc->sync_mode is WB_SYNC_ALL.
      With this patch:
       O_SYNC writes are NFS_FILE_SYNC for single requests, and NFS_UNSTABLE
          followed by COMMIT for multiple requests
       Writing immediately before close of fsync follow the same pattern.
       Non-O_SYNC writes without an fsync of close eventually get flushed
       out as UNSTABLE and a commit follows eventually as appropriate.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e87b4c7a
    • N
      NFSv4: use exponential retry on NFS4ERR_DELAY for async requests. · 8478eaa1
      NeilBrown 提交于
      Currently asynchronous NFSv4 request will be retried with
      exponential timeout (from 1/10 to 15 seconds), but async
      requests will always use a 15second retry.
      
      Some "async" requests are really synchronous though.  The
      async mechanism is used to allow the request to continue if
      the requesting process is killed.
      In those cases, an exponential retry is appropriate.
      
      For example, if two different clients both open a file and
      get a READ delegation, and one client then unlinks the file
      (while still holding an open file descriptor), that unlink
      will used the "silly-rename" handling which is async.
      The first rename will result in NFS4ERR_DELAY while the
      delegation is reclaimed from the other client.  The rename
      will not be retried for 15 seconds, causing an unlink to take
      15 seconds rather than 100msec.
      
      This patch only added exponential timeout for async unlink and
      async rename.  Other async calls, such as 'close' are sometimes
      waited for so they might benefit from exponential timeout too.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      8478eaa1
    • O
      Fixing lease renewal · 8faaa6d5
      Olga Kornievskaia 提交于
      Commit c9fdeb28 removed a 'continue' after checking if the lease needs
      to be renewed. However, if client hasn't moved, the code falls down to
      starting reboot recovery erroneously (ie., sends open reclaim and gets
      back stale_clientid error) before recovering from getting stale_clientid
      on the renew operation.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Fixes: c9fdeb28 (NFS: Add basic migration support to state manager thread)
      Cc: stable@vger.kernel.org # 3.13+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      8faaa6d5
    • F
      nfs: fix duplicate proc entries · 2f3169fb
      Fabian Frederick 提交于
      Commit 65b38851
      ("NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes")
      
      updated the following function:
      static int nfs_volume_list_open(struct inode *inode, struct file *file)
      
      it used &nfs_server_list_ops instead of &nfs_volume_list_ops
      which means cat /proc/fs/nfsfs/volumes = /proc/fs/nfsfs/servers
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Fixes: 65b38851 (NFS: Fix /proc/fs/nfsfs/servers and...)
      Cc: stable@vger.kernel.org # 3.4.x+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2f3169fb
  11. 22 9月, 2014 1 次提交
  12. 19 9月, 2014 3 次提交
    • K
      sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() · f139caf2
      Kirill Tkhai 提交于
      schedule(), io_schedule() and schedule_timeout() always return
      with TASK_RUNNING state set, so one more setting is unnecessary.
      
      (All places in patch are visible good, only exception is
       kiblnd_scheduler() from:
      
            drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
      
       Its schedule() is one line above standard 3 lines of unified diff)
      
      No places where set_current_state() is used for mb().
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Anil Belur <askb23@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dmitry Eremin <dmitry.eremin@intel.com>
      Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Isaac Huang <he.huang@intel.com>
      Cc: James E.J. Bottomley <JBottomley@parallels.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liang Zhen <liang.zhen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masaru Nomura <massa.nomura@gmail.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: Peng Tao <bergwolf@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Robert Love <robert.w.love@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Cc: Zi Shen Lim <zlim.lnx@gmail.com>
      Cc: devel@driverdev.osuosl.org
      Cc: dm-devel@redhat.com
      Cc: dri-devel@lists.freedesktop.org
      Cc: fcoe-devel@open-fcoe.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-afs@lists.infradead.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: qla2xxx-upstream@qlogic.com
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f139caf2
    • T
      NFSv4: Fix another bug in the close/open_downgrade code · cd9288ff
      Trond Myklebust 提交于
      James Drew reports another bug whereby the NFS client is now sending
      an OPEN_DOWNGRADE in a situation where it should really have sent a
      CLOSE: the client is opening the file for O_RDWR, but then trying to
      do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.
      Reported-by: NJames Drews <drews@engr.wisc.edu>
      Link: http://lkml.kernel.org/r/541AD7E5.8020409@engr.wisc.edu
      Fixes: aee7af35 (NFSv4: Fix problems with close in the presence...)
      Cc: stable@vger.kernel.org # 2.6.33+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      cd9288ff
    • S
      NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists() · 080af20c
      Steve Dickson 提交于
      There is a race between nfs4_state_manager() and
      nfs_server_remove_lists() that happens during a nfsv3 mount.
      
      The v3 mount notices there is already a supper block so
      nfs_server_remove_lists() called which uses the nfs_client_lock
      spin lock to synchronize access to the client list.
      
      At the same time nfs4_state_manager() is running through
      the client list looking for work to do, using the same
      lock. When nfs4_state_manager() wins the race to the
      list, a v3 client pointer is found and not ignored
      properly which causes the panic.
      
      Moving some protocol checks before the state checking
      avoids the panic.
      
      CC: Stable Tree <stable@vger.kernel.org>
      Signed-off-by: NSteve Dickson <steved@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      080af20c
  13. 17 9月, 2014 1 次提交
  14. 16 9月, 2014 1 次提交
  15. 13 9月, 2014 3 次提交