1. 15 8月, 2017 7 次提交
  2. 12 8月, 2017 1 次提交
  3. 10 8月, 2017 1 次提交
  4. 09 8月, 2017 1 次提交
  5. 02 8月, 2017 2 次提交
  6. 29 7月, 2017 1 次提交
    • B
      NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter · b7dbcc0e
      Benjamin Coddington 提交于
      nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
      same region protected by the wait_queue's lock after checking for a
      notification from CB_NOTIFY_LOCK callback.  However, after releasing that
      lock, a wakeup for that task may race in before the call to
      freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
      freezable_schedule_timeout_interruptible() will set the state back to
      TASK_INTERRUPTIBLE before the task will sleep.  The result is that the task
      will sleep for the entire duration of the timeout.
      
      Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
      freezable_schedule_timout() instead.
      
      Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b7dbcc0e
  7. 27 7月, 2017 3 次提交
    • N
      NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43
      NeilBrown 提交于
      posix_fallocate() will allocate space in an NFS file by considering
      the last byte of every 4K block.  If it is before EOF, it will read
      the byte and if it is zero, a zero is written out.  If it is after EOF,
      the zero is unconditionally written.
      
      For the blocks beyond EOF, if NFS believes its cache is valid, it will
      expand these writes to write full pages, and then will merge the pages.
      This results if (typically) 1MB writes.  If NFS believes its cache is
      not valid (particularly if NFS_INO_INVALID_DATA or
      NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
      send the individual 1-byte writes. This results in (typically) 256 times
      as many RPC requests, and can be substantially slower.
      
      Currently nfs_revalidate_mapping() is only used when reading a file or
      mmapping a file, as these are times when the content needs to be
      up-to-date.  Writes don't generally need the cache to be up-to-date, but
      writes beyond EOF can benefit, particularly in the posix_fallocate()
      case.
      
      So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
      i.e. when there is a gap between the end of the file and the start of
      the write.  If the cache is thought to be out of date (as happens after
      taking a file lock), this will cause a GETATTR, and the two flags
      mentioned above will be cleared.  With this, posix_fallocate() on a
      newly locked file does not generate excessive tiny writes.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6ba80d43
    • N
      NFS: invalidate file size when taking a lock. · 442ce049
      NeilBrown 提交于
      Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
      for writing"), NFS would revalidate, or invalidate, the file size when
      taking a lock.  Since that commit it only invalidates the file content.
      
      If the file size is changed on the server while wait for the lock, the
      client will have an incorrect understanding of the file size and could
      corrupt data.  This particularly happens when writing beyond the
      (supposed) end of file and can be easily be demonstrated with
      posix_fallocate().
      
      If an application opens an empty file, waits for a write lock, and then
      calls posix_fallocate(), glibc will determine that the underlying
      filesystem doesn't support fallocate (assuming version 4.1 or earlier)
      and will write out a '0' byte at the end of each 4K page in the region
      being fallocated that is after the end of the file.
      NFS will (usually) detect that these writes are beyond EOF and will
      expand them to cover the whole page, and then will merge the pages.
      Consequently, NFS will write out large blocks of zeroes beyond where it
      thought EOF was.  If EOF had moved, the pre-existing part of the file
      will be over-written.  Locking should have protected against this,
      but it doesn't.
      
      This patch restores the use of nfs_zap_caches() which invalidated the
      cached attributes.  When posix_fallocate() asks for the file size, the
      request will go to the server and get a correct answer.
      
      cc: stable@vger.kernel.org (v4.8+)
      Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      442ce049
    • A
      NFS: Use raw NFS access mask in nfs4_opendata_access() · 1e6f2095
      Anna Schumaker 提交于
      Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache") changed how the access results are stored after an
      access() call.  An NFS v4 OPEN might have access bits returned with the
      opendata, so we should use the NFS4_ACCESS values when determining the
      return value in nfs4_opendata_access().
      
      Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache")
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Tested-by: NTakashi Iwai <tiwai@suse.de>
      1e6f2095
  8. 22 7月, 2017 1 次提交
  9. 21 7月, 2017 5 次提交
  10. 20 7月, 2017 5 次提交
  11. 14 7月, 2017 13 次提交
    • T
      NFS: Don't run wake_up_bit() when nobody is waiting... · b4f937cf
      Trond Myklebust 提交于
      "perf lock" shows fairly heavy contention for the bit waitqueue locks
      when doing an I/O heavy workload.
      Use a bit to tell whether or not there has been contention for a lock
      so that we can optimise away the bit waitqueue options in those cases.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b4f937cf
    • P
      nfs: add export operations · 20fa1902
      Peng Tao 提交于
      This support for opening files on NFS by file handle, both through the
      open_by_handle syscall, and for re-exporting NFS (for example using a
      different version).  The support is very basic for now, as each open by
      handle will have to do an NFSv4 open operation on the wire.  In the
      future this will hopefully be mitigated by an open file cache, as well
      as various optimizations in NFS for this specific case.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      [hch: incorporated various changes, resplit the patches, new changelog]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      20fa1902
    • T
      NFS: Don't run wake_up_bit() when nobody is waiting... · 301bfa48
      Trond Myklebust 提交于
      "perf lock" shows fairly heavy contention for the bit waitqueue locks
      when doing an I/O heavy workload.
      Use a bit to tell whether or not there has been contention for a lock
      so that we can optimise away the bit waitqueue options in those cases.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      301bfa48
    • J
      nfs4: add NFSv4 LOOKUPP handlers · 5b5faaf6
      Jeff Layton 提交于
      This will be needed in order to implement the get_parent export op
      for nfsd.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5b5faaf6
    • P
      nfs: add export operations · 00422483
      Peng Tao 提交于
      This support for opening files on NFS by file handle, both through the
      open_by_handle syscall, and for re-exporting NFS (for example using a
      different version).  The support is very basic for now, as each open by
      handle will have to do an NFSv4 open operation on the wire.  In the
      future this will hopefully be mitigated by an open file cache, as well
      as various optimizations in NFS for this specific case.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      [hch: incorporated various changes, resplit the patches, new changelog]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      00422483
    • P
      nfs: add a nfs_ilookup helper · f174ff7a
      Peng Tao 提交于
      This helper will allow to find an existing NFS inode by the file handle
      and fattr.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      [hch: split from a larger patch]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f174ff7a
    • P
      nfs: replace d_add with d_splice_alias in atomic_open · 774d9513
      Peng Tao 提交于
      It's a trival change but follows knfsd export document that asks
      for d_splice_alias during lookup.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      774d9513
    • O
      NFSv4.2 fix size storage for nfs42_proc_copy · 1ee48bdd
      Olga Kornievskaia 提交于
      Return size of COPY is u64 but it was assigned to an "int" status.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1ee48bdd
    • C
      NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration · 838edb94
      Chuck Lever 提交于
      Transparent State Migration copies a client's lease state from the
      server where a filesystem used to reside to the server where it now
      resides. When an NFSv4.1 client first contacts that destination
      server, it uses EXCHANGE_ID to detect trunking relationships.
      
      The lease that was copied there is returned to that client, but the
      destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
      the client. This is because the lease was confirmed on the source
      server (before it was copied).
      
      When CONFIRMED_R is set, the client throws away the sequence ID
      returned by the server. During a Transparent State Migration, however
      there's no other way for the client to know what sequence ID to use
      with a lease that's been migrated.
      
      Therefore, the client must save and use the contrived slot sequence
      value returned by the destination server even when CONFIRMED_R is
      set.
      
      Note that some servers always return a seqid of 1 after a migration.
      Reported-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      838edb94
    • C
      NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration · 8dcbec6d
      Chuck Lever 提交于
      Transparent State Migration copies a client's lease state from the
      server where a filesystem used to reside to the server where it now
      resides. When an NFSv4.1 client first contacts that destination
      server, it uses EXCHANGE_ID to detect trunking relationships.
      
      The lease that was copied there is returned to that client, but the
      destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
      the client. This is because the lease was confirmed on the source
      server (before it was copied).
      
      Normally, when CONFIRMED_R is set, a client purges the lease and
      creates a new one. However, that throws away the entire benefit of
      Transparent State Migration.
      
      Therefore, the client must not purge that lease when it is possible
      that Transparent State Migration has occurred.
      Reported-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NXuan Qi <xuan.qi@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      8dcbec6d
    • N
      NFS: check for nfs_refresh_inode() errors in nfs_fhget() · 26fde4df
      NeilBrown 提交于
      If an NFS server returns a filehandle that we have previously
      seen, and reports a different type, then nfs_refresh_inode()
      will log a warning and return an error.
      
      nfs_fhget() does not check for this error and may return an
      inode with a different type than the one that the server
      reported.
      
      This is likely to cause confusion, and is one way that
      ->open_context() could return a directory inode as discussed
      in the previous patch.
      
      So if nfs_refresh_inode() returns and error, return that error
      from nfs_fhget() to avoid the confusion propagating.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      26fde4df
    • N
      NFS: guard against confused server in nfs_atomic_open() · eaa2b82c
      NeilBrown 提交于
      A confused server could return a filehandle for an
      NFSv4 OPEN request, which it previously returned for a directory.
      So the inode returned by  ->open_context() in nfs_atomic_open()
      could conceivably be a directory inode.
      
      This has particular implications for the call to
      nfs_file_set_open_context() in nfs_finish_open().
      If that is called on a directory inode, then the nfs_open_context
      that gets stored in the filp->private_data will be linked to
      nfs_inode->open_files.
      
      When the directory is closed, nfs_closedir() will (ultimately)
      free the ->private_data, but not unlink it from nfs_inode->open_files
      (because it doesn't expect an nfs_open_context there).
      
      Subsequently the memory could get used for something else and eventually
      if the ->open_files list is walked, the walker will fall off the end and
      crash.
      
      So: change nfs_finish_open() to only call nfs_file_set_open_context()
      for regular-file inodes.
      
      This failure mode has been seen in a production setting (unknown NFS
      server implementation).  The kernel was v3.0 and the specific sequence
      seen would not affect more recent kernels, but I think a risk is still
      present, and caution is wise.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      eaa2b82c
    • N
      NFS: only invalidate dentrys that are clearly invalid. · cc89684c
      NeilBrown 提交于
      Since commit bafc9b75 ("vfs: More precise tests in d_invalidate")
      in v3.18, a return of '0' from ->d_revalidate() will cause the dentry
      to be invalidated even if it has filesystems mounted on or it or on a
      descendant.  The mounted filesystem is unmounted.
      
      This means we need to be careful not to return 0 unless the directory
      referred to truly is invalid.  So -ESTALE or -ENOENT should invalidate
      the directory.  Other errors such a -EPERM or -ERESTARTSYS should be
      returned from ->d_revalidate() so they are propagated to the caller.
      
      A particular problem can be demonstrated by:
      
      1/ mount an NFS filesystem using NFSv3 on /mnt
      2/ mount any other filesystem on /mnt/foo
      3/ ls /mnt/foo
      4/ turn off network, or otherwise make the server unable to respond
      5/ ls /mnt/foo &
      6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack
      7/ kill -9 $! # this results in -ERESTARTSYS being returned
      8/ observe that /mnt/foo has been unmounted.
      
      This patch changes nfs_lookup_revalidate() to only treat
        -ESTALE from nfs_lookup_verify_inode() and
        -ESTALE or -ENOENT from ->lookup()
      as indicating an invalid inode.  Other errors are returned.
      
      Also nfs_check_inode_attributes() is changed to return -ESTALE rather
      than -EIO.  This is consistent with the error returned in similar
      circumstances from nfs_update_inode().
      
      As this bug allows any user to unmount a filesystem mounted on an NFS
      filesystem, this fix is suitable for stable kernels.
      
      Fixes: bafc9b75 ("vfs: More precise tests in d_invalidate")
      Cc: stable@vger.kernel.org (v3.18+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      cc89684c