1. 27 7月, 2017 3 次提交
    • N
      NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43
      NeilBrown 提交于
      posix_fallocate() will allocate space in an NFS file by considering
      the last byte of every 4K block.  If it is before EOF, it will read
      the byte and if it is zero, a zero is written out.  If it is after EOF,
      the zero is unconditionally written.
      
      For the blocks beyond EOF, if NFS believes its cache is valid, it will
      expand these writes to write full pages, and then will merge the pages.
      This results if (typically) 1MB writes.  If NFS believes its cache is
      not valid (particularly if NFS_INO_INVALID_DATA or
      NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
      send the individual 1-byte writes. This results in (typically) 256 times
      as many RPC requests, and can be substantially slower.
      
      Currently nfs_revalidate_mapping() is only used when reading a file or
      mmapping a file, as these are times when the content needs to be
      up-to-date.  Writes don't generally need the cache to be up-to-date, but
      writes beyond EOF can benefit, particularly in the posix_fallocate()
      case.
      
      So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
      i.e. when there is a gap between the end of the file and the start of
      the write.  If the cache is thought to be out of date (as happens after
      taking a file lock), this will cause a GETATTR, and the two flags
      mentioned above will be cleared.  With this, posix_fallocate() on a
      newly locked file does not generate excessive tiny writes.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6ba80d43
    • N
      NFS: invalidate file size when taking a lock. · 442ce049
      NeilBrown 提交于
      Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
      for writing"), NFS would revalidate, or invalidate, the file size when
      taking a lock.  Since that commit it only invalidates the file content.
      
      If the file size is changed on the server while wait for the lock, the
      client will have an incorrect understanding of the file size and could
      corrupt data.  This particularly happens when writing beyond the
      (supposed) end of file and can be easily be demonstrated with
      posix_fallocate().
      
      If an application opens an empty file, waits for a write lock, and then
      calls posix_fallocate(), glibc will determine that the underlying
      filesystem doesn't support fallocate (assuming version 4.1 or earlier)
      and will write out a '0' byte at the end of each 4K page in the region
      being fallocated that is after the end of the file.
      NFS will (usually) detect that these writes are beyond EOF and will
      expand them to cover the whole page, and then will merge the pages.
      Consequently, NFS will write out large blocks of zeroes beyond where it
      thought EOF was.  If EOF had moved, the pre-existing part of the file
      will be over-written.  Locking should have protected against this,
      but it doesn't.
      
      This patch restores the use of nfs_zap_caches() which invalidated the
      cached attributes.  When posix_fallocate() asks for the file size, the
      request will go to the server and get a correct answer.
      
      cc: stable@vger.kernel.org (v4.8+)
      Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      442ce049
    • A
      NFS: Use raw NFS access mask in nfs4_opendata_access() · 1e6f2095
      Anna Schumaker 提交于
      Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache") changed how the access results are stored after an
      access() call.  An NFS v4 OPEN might have access bits returned with the
      opendata, so we should use the NFS4_ACCESS values when determining the
      return value in nfs4_opendata_access().
      
      Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache")
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Tested-by: NTakashi Iwai <tiwai@suse.de>
      1e6f2095
  2. 22 7月, 2017 1 次提交
  3. 21 7月, 2017 6 次提交
  4. 20 7月, 2017 5 次提交
  5. 14 7月, 2017 25 次提交
    • T
      NFS: Don't run wake_up_bit() when nobody is waiting... · 301bfa48
      Trond Myklebust 提交于
      "perf lock" shows fairly heavy contention for the bit waitqueue locks
      when doing an I/O heavy workload.
      Use a bit to tell whether or not there has been contention for a lock
      so that we can optimise away the bit waitqueue options in those cases.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      301bfa48
    • P
      nfs: add export operations · 00422483
      Peng Tao 提交于
      This support for opening files on NFS by file handle, both through the
      open_by_handle syscall, and for re-exporting NFS (for example using a
      different version).  The support is very basic for now, as each open by
      handle will have to do an NFSv4 open operation on the wire.  In the
      future this will hopefully be mitigated by an open file cache, as well
      as various optimizations in NFS for this specific case.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      [hch: incorporated various changes, resplit the patches, new changelog]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      00422483
    • J
      nfs4: add NFSv4 LOOKUPP handlers · 5b5faaf6
      Jeff Layton 提交于
      This will be needed in order to implement the get_parent export op
      for nfsd.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5b5faaf6
    • P
      nfs: add a nfs_ilookup helper · f174ff7a
      Peng Tao 提交于
      This helper will allow to find an existing NFS inode by the file handle
      and fattr.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      [hch: split from a larger patch]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f174ff7a
    • P
      nfs: replace d_add with d_splice_alias in atomic_open · 774d9513
      Peng Tao 提交于
      It's a trival change but follows knfsd export document that asks
      for d_splice_alias during lookup.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      774d9513
    • J
      sunrpc: use constant time memory comparison for mac · 15a8b93f
      Jason A. Donenfeld 提交于
      Otherwise, we enable a MAC forgery via timing attack.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      15a8b93f
    • O
      NFSv4.2 fix size storage for nfs42_proc_copy · 1ee48bdd
      Olga Kornievskaia 提交于
      Return size of COPY is u64 but it was assigned to an "int" status.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1ee48bdd
    • C
      xprtrdma: Fix documenting comments in frwr_ops.c · 6afafa77
      Chuck Lever 提交于
      Clean up.
      
      FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
      is used for the last LOCAL_INV WR in a chain, which is always
      signaled. The documenting comments should reflect that.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6afafa77
    • C
      xprtrdma: Replace PAGE_MASK with offset_in_page() · d933cc32
      Chuck Lever 提交于
      Clean up.
      
      Reported by: Geliang Tang <geliangtang@gmail.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d933cc32
    • C
      xprtrdma: FMR does not need list_del_init() · e2f6ef09
      Chuck Lever 提交于
      Clean up.
      
      Commit 38f1932e ("xprtrdma: Remove FMRs from the unmap list
      after unmapping") utilized list_del_init() to try to prevent some
      list corruption. The corruption was actually caused by the reply
      handler racing with a signal. Now that MR invalidation is properly
      serialized, list_del_init() can safely be replaced.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e2f6ef09
    • C
      xprtrdma: Demote "connect" log messages · 173b8f49
      Chuck Lever 提交于
      Some have complained about the log messages generated when xprtrdma
      opens or closes a connection to a server. When an NFS mount is
      mostly idle these can appear every few minutes as the client idles
      out the connection and reconnects.
      
      Connection and disconnection is a normal part of operation, and not
      exceptional, so change these to dprintk's for now. At some point
      all of these will be converted to tracepoints, but that's for
      another day.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      173b8f49
    • C
      NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration · 838edb94
      Chuck Lever 提交于
      Transparent State Migration copies a client's lease state from the
      server where a filesystem used to reside to the server where it now
      resides. When an NFSv4.1 client first contacts that destination
      server, it uses EXCHANGE_ID to detect trunking relationships.
      
      The lease that was copied there is returned to that client, but the
      destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
      the client. This is because the lease was confirmed on the source
      server (before it was copied).
      
      When CONFIRMED_R is set, the client throws away the sequence ID
      returned by the server. During a Transparent State Migration, however
      there's no other way for the client to know what sequence ID to use
      with a lease that's been migrated.
      
      Therefore, the client must save and use the contrived slot sequence
      value returned by the destination server even when CONFIRMED_R is
      set.
      
      Note that some servers always return a seqid of 1 after a migration.
      Reported-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      838edb94
    • C
      NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration · 8dcbec6d
      Chuck Lever 提交于
      Transparent State Migration copies a client's lease state from the
      server where a filesystem used to reside to the server where it now
      resides. When an NFSv4.1 client first contacts that destination
      server, it uses EXCHANGE_ID to detect trunking relationships.
      
      The lease that was copied there is returned to that client, but the
      destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
      the client. This is because the lease was confirmed on the source
      server (before it was copied).
      
      Normally, when CONFIRMED_R is set, a client purges the lease and
      creates a new one. However, that throws away the entire benefit of
      Transparent State Migration.
      
      Therefore, the client must not purge that lease when it is possible
      that Transparent State Migration has occurred.
      Reported-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      8dcbec6d
    • C
      xprtrdma: Don't defer MR recovery if ro_map fails · 1f541895
      Chuck Lever 提交于
      Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
      invokes rpcrdma_defer_mr_recovery in some error cases where the MW
      has not even been DMA-mapped yet.
      
      Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.
      
      Also note that if ib_dma_map_sg is asked to map 0 nents, it will
      return 0. So the extra "if (i == 0)" check is no longer needed.
      
      Fixes: 42fe28f6 ("xprtrdma: Do not leak an MW during a DMA ...")
      Fixes: 505bbe64 ("xprtrdma: Refactor MR recovery work queues")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1f541895
    • C
      xprtrdma: Fix FRWR invalidation error recovery · 8d75483a
      Chuck Lever 提交于
      When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
      examined, and the MRs reset by hand.
      
      I'm not sure how the existing code can work by comparing R_keys.
      Restructure the logic so that instead it walks the chain of WRs,
      starting from the first bad one.
      
      Make sure to wait for completion if at least one WR was actually
      posted. Otherwise, if the ib_post_send fails, we can end up
      DMA-unmapping the MR while LOCAL_INV operations are in flight.
      
      Commit 7a89f9c6 ("xprtrdma: Honor ->send_request API contract")
      added the rdma_disconnect() call site. The disconnect actually
      causes more problems than it solves, and SQ overruns happen only as
      a result of software bugs. So remove it.
      
      Fixes: d7a21c1b ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      8d75483a
    • C
      xprtrdma: Fix client lock-up after application signal fires · 431af645
      Chuck Lever 提交于
      After a signal, the RPC client aborts synchronous RPCs running on
      behalf of the signaled application.
      
      The server is still executing those RPCs, and will write the results
      back into the client's memory when it's done. By the time the server
      writes the results, that memory is likely being used for other
      purposes. Therefore xprtrdma has to immediately invalidate all
      memory regions used by those aborted RPCs to prevent the server's
      writes from clobbering that re-used memory.
      
      With FMR memory registration, invalidation takes a relatively long
      time. In fact, the invalidation is often still running when the
      server tries to write the results into the memory regions that are
      being invalidated.
      
      This sets up a race between two processes:
      
      1.  After the signal, xprt_rdma_free calls ro_unmap_safe.
      2.  While ro_unmap_safe is still running, the server replies and
          rpcrdma_reply_handler runs, calling ro_unmap_sync.
      
      Both processes invoke ib_unmap_fmr on the same FMR.
      
      The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
      the same time, but HCAs generally don't tolerate this. Sometimes
      this can result in a system crash.
      
      If the HCA happens to survive, rpcrdma_reply_handler continues. It
      removes the rpc_rqst from rq_list and releases the transport_lock.
      This enables xprt_rdma_free to run in another process, and the
      rpc_rqst is released while rpcrdma_reply_handler is still waiting
      for the ib_unmap_fmr call to finish.
      
      But further down in rpcrdma_reply_handler, the transport_lock is
      taken again, and "rqst" is dereferenced. If "rqst" has already been
      released, this triggers a general protection fault. Since bottom-
      halves are disabled, the system locks up.
      
      Address both issues by reversing the order of the xprt_lookup_rqst
      call and the ro_unmap_sync call. Introduce a separate lookup
      mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
      xprt_lookup_rqst. Now the handler takes the transport_lock once
      and holds it for the XID lookup and RPC completion.
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
      Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      431af645
    • C
      xprtrdma: Rename rpcrdma_req::rl_free · a80d66c9
      Chuck Lever 提交于
      Clean up: I'm about to use the rl_free field for purposes other than
      a free list. So use a more generic name.
      
      This is a refactoring change only.
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
      Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      a80d66c9
    • C
      xprtrdma: Pass only the list of registered MRs to ro_unmap_sync · 451d26e1
      Chuck Lever 提交于
      There are rare cases where an rpcrdma_req can be re-used (via
      rpcrdma_buffer_put) while the RPC reply handler is still running.
      This is due to a signal firing at just the wrong instant.
      
      Since commit 9d6b0409 ("xprtrdma: Place registered MWs on a
      per-req list"), rpcrdma_mws are self-contained; ie., they fully
      describe an MR and scatterlist, and no part of that information is
      stored in struct rpcrdma_req.
      
      As part of closing the above race window, pass only the req's list
      of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
      itself.
      
      Some extra transport header sanity checking is removed. Since the
      client depends on its own recollection of what memory had been
      registered, there doesn't seem to be a way to abuse this change.
      
      And, the check was not terribly effective. If the client had sent
      Read chunks, the "list_empty" test is negative in both of the
      removed cases, which are actually looking for Write or Reply
      chunks.
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
      Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      451d26e1
    • C
      xprtrdma: Pre-mark remotely invalidated MRs · 4b196dc6
      Chuck Lever 提交于
      There are rare cases where an rpcrdma_req and its matched
      rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
      reply handler is still using that req. This is typically due to a
      signal firing at just the wrong instant.
      
      As part of closing this race window, avoid using the wrong
      rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
      invalidated while we are sure the rep is still OK to use.
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
      Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      4b196dc6
    • C
      xprtrdma: On invalidation failure, remove MWs from rl_registered · 04d25b7d
      Chuck Lever 提交于
      Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
      the list of registered MRs. Ensure that all paths through
      fmr_op_unmap_sync() remove MWs from that list.
      
      Fixes: 9d6b0409 ("xprtrdma: Place registered MWs on a ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      04d25b7d
    • N
      NFS: check for nfs_refresh_inode() errors in nfs_fhget() · 26fde4df
      NeilBrown 提交于
      If an NFS server returns a filehandle that we have previously
      seen, and reports a different type, then nfs_refresh_inode()
      will log a warning and return an error.
      
      nfs_fhget() does not check for this error and may return an
      inode with a different type than the one that the server
      reported.
      
      This is likely to cause confusion, and is one way that
      ->open_context() could return a directory inode as discussed
      in the previous patch.
      
      So if nfs_refresh_inode() returns and error, return that error
      from nfs_fhget() to avoid the confusion propagating.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      26fde4df
    • N
      NFS: guard against confused server in nfs_atomic_open() · eaa2b82c
      NeilBrown 提交于
      A confused server could return a filehandle for an
      NFSv4 OPEN request, which it previously returned for a directory.
      So the inode returned by  ->open_context() in nfs_atomic_open()
      could conceivably be a directory inode.
      
      This has particular implications for the call to
      nfs_file_set_open_context() in nfs_finish_open().
      If that is called on a directory inode, then the nfs_open_context
      that gets stored in the filp->private_data will be linked to
      nfs_inode->open_files.
      
      When the directory is closed, nfs_closedir() will (ultimately)
      free the ->private_data, but not unlink it from nfs_inode->open_files
      (because it doesn't expect an nfs_open_context there).
      
      Subsequently the memory could get used for something else and eventually
      if the ->open_files list is walked, the walker will fall off the end and
      crash.
      
      So: change nfs_finish_open() to only call nfs_file_set_open_context()
      for regular-file inodes.
      
      This failure mode has been seen in a production setting (unknown NFS
      server implementation).  The kernel was v3.0 and the specific sequence
      seen would not affect more recent kernels, but I think a risk is still
      present, and caution is wise.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      eaa2b82c
    • N
      NFS: only invalidate dentrys that are clearly invalid. · cc89684c
      NeilBrown 提交于
      Since commit bafc9b75 ("vfs: More precise tests in d_invalidate")
      in v3.18, a return of '0' from ->d_revalidate() will cause the dentry
      to be invalidated even if it has filesystems mounted on or it or on a
      descendant.  The mounted filesystem is unmounted.
      
      This means we need to be careful not to return 0 unless the directory
      referred to truly is invalid.  So -ESTALE or -ENOENT should invalidate
      the directory.  Other errors such a -EPERM or -ERESTARTSYS should be
      returned from ->d_revalidate() so they are propagated to the caller.
      
      A particular problem can be demonstrated by:
      
      1/ mount an NFS filesystem using NFSv3 on /mnt
      2/ mount any other filesystem on /mnt/foo
      3/ ls /mnt/foo
      4/ turn off network, or otherwise make the server unable to respond
      5/ ls /mnt/foo &
      6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack
      7/ kill -9 $! # this results in -ERESTARTSYS being returned
      8/ observe that /mnt/foo has been unmounted.
      
      This patch changes nfs_lookup_revalidate() to only treat
        -ESTALE from nfs_lookup_verify_inode() and
        -ESTALE or -ENOENT from ->lookup()
      as indicating an invalid inode.  Other errors are returned.
      
      Also nfs_check_inode_attributes() is changed to return -ESTALE rather
      than -EIO.  This is consistent with the error returned in similar
      circumstances from nfs_update_inode().
      
      As this bug allows any user to unmount a filesystem mounted on an NFS
      filesystem, this fix is suitable for stable kernels.
      
      Fixes: bafc9b75 ("vfs: More precise tests in d_invalidate")
      Cc: stable@vger.kernel.org (v3.18+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      cc89684c
    • O
      PNFS for stateid errors retry against MDS first · 22368ff1
      Olga Kornievskaia 提交于
      Upon receiving a stateid error such as BAD_STATEID, the client
      should retry the operation against the MDS before deciding to
      do stateid recovery.
      
      Previously, the code would initiate state recovery and it could
      lead to a race in a state manager that could chose an incorrect
      recovery method which would lead to the EIO failure for the
      application.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      22368ff1
    • O
      PNFS fix EACCESS on commit to DS handling · a0bc01e0
      Olga Kornievskaia 提交于
      Commit fabbbee0 "PNFS fix fallback to MDS if got error on
      commit to DS" moved the pnfs_set_lo_fail() to unhandled errors
      which was not correct and lead to a kernel oops on umount.
      
      Instead, fix the original EACCESS on commit to DS error by
      getting the new layout and re-doing the IO.
      
      Fixes: fabbbee0 ("PNFS fix fallback to MDS if got error on commit to DS")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org # v4.12
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      a0bc01e0