1. 15 1月, 2020 1 次提交
  2. 18 11月, 2019 1 次提交
  3. 21 5月, 2019 1 次提交
  4. 26 4月, 2019 2 次提交
  5. 21 2月, 2019 3 次提交
    • K
      pNFS: Avoid read/modify/write when it is not necessary · 2cde04e9
      Kazuo Ito 提交于
      As the block and SCSI layouts can only read/write fixed-length
      blocks, we must perform read-modify-write when data to be written is
      not aligned to a block boundary or smaller than the block size.
      (612aa983 pnfs: add flag to force read-modify-write in ->write_begin)
      
      The current code tries to see if we have to do read-modify-write
      on block-oriented pNFS layouts by just checking !PageUptodate(page),
      but the same condition also applies for overwriting of any uncached
      potions of existing files, making such operations excessively slow
      even it is block-aligned.
      
      The change does not affect the optimization for modify-write-read
      cases (38c73044 NFS: read-modify-write page updating),
      because partial update of !PageUptodate() pages can only happen
      in layouts that can do arbitrary length read/write and never
      in block-based ones.
      
      Testing results:
      
      We ran fio on one of the pNFS clients running 4.20 kernel
      (vanilla and patched) in this configuration to read/write/overwrite
      files on the storage array, exported as pnfs share by the server.
      
       pNFS clients ---1G Ethernet--- pNFS server
       (HP DL360 G8)                  (HP DL360 G8)
             |                              |
             |                              |
             +------8G Fiber Channel--------+
                           |
                     Storage Array
                       (HP P6350)
      
      Throughput of overwrite (both buffered and O_SYNC) is noticeably
      improved.
      
      Ops.     |block size|   Throughput   |
               |  (KiB)   |    (MiB/s)     |
               |          |  4.20 | patched|
      ---------+----------+----------------+
      buffered |         4|  21.3 |  232   |
      overwrite|        32|  22.2 |  256   |
               |       512|  22.4 |  260   |
      ---------+----------+----------------+
      O_SYNC   |         4|   3.84|    4.77|
      overwrite|        32|  12.2 |   32.0 |
               |       512|  18.5 |  152   |
      ---------+----------+----------------+
      
      Read and write (buffered and O_SYNC) by the same client remain unchanged
      by the patch either negatively or positively, as they should do.
      
      Ops.     |block size|   Throughput   |
               |  (KiB)   |    (MiB/s)     |
               |          |  4.20 | patched|
      ---------+----------+----------------+
      read     |         4| 548   |  550   |
               |        32| 547   |  551   |
               |       512| 548   |  551   |
      ---------+----------+----------------+
      buffered |         4| 237   |  244   |
      write    |        32| 261   |  268   |
               |       512| 265   |  272   |
      ---------+----------+----------------+
      O_SYNC   |         4|   0.46|    0.46|
      write    |        32|   3.60|    3.57|
               |       512| 105   |  106   |
      ---------+----------+----------------+
      Signed-off-by: NKazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
      Tested-by: NHiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      2cde04e9
    • K
      pNFS: Fix potential corruption of page being written · 97ae91bb
      Kazuo Ito 提交于
      nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS
      block or SCSI layout was in use, therefore we could lose data forever
      if the page being written was filled by a read before completion.
      Signed-off-by: NKazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      97ae91bb
    • T
      NFS: Fix up documentation warnings · 302fad7b
      Trond Myklebust 提交于
      Fix up some compiler warnings about function parameters, etc not being
      correctly described or formatted.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      302fad7b
  6. 31 7月, 2018 1 次提交
  7. 18 11月, 2017 1 次提交
  8. 12 9月, 2017 1 次提交
    • N
      NFS: various changes relating to reporting IO errors. · bf4b4905
      NeilBrown 提交于
      1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
         They aren't used.
      
      2/ Make nfs_context_set_write_error() a "static inline" in internal.h
         so we can...
      
      3/ Use nfs_context_set_write_error() instead of mapping_set_error()
         if nfs_pageio_add_request() fails before sending any request.
         NFS generally keeps errors in the open_context, not the mapping,
         so this is more consistent.
      
      4/ If filemap_write_and_write_range() reports any error, still
         check ctx->error.  The value in ctx->error is likely to be
         more useful.  As part of this, NFS_CONTEXT_ERROR_WRITE is
         cleared slightly earlier, before nfs_file_fsync_commit() is called,
         rather than at the start of that function.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      bf4b4905
  9. 07 9月, 2017 2 次提交
  10. 27 7月, 2017 2 次提交
    • N
      NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43
      NeilBrown 提交于
      posix_fallocate() will allocate space in an NFS file by considering
      the last byte of every 4K block.  If it is before EOF, it will read
      the byte and if it is zero, a zero is written out.  If it is after EOF,
      the zero is unconditionally written.
      
      For the blocks beyond EOF, if NFS believes its cache is valid, it will
      expand these writes to write full pages, and then will merge the pages.
      This results if (typically) 1MB writes.  If NFS believes its cache is
      not valid (particularly if NFS_INO_INVALID_DATA or
      NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
      send the individual 1-byte writes. This results in (typically) 256 times
      as many RPC requests, and can be substantially slower.
      
      Currently nfs_revalidate_mapping() is only used when reading a file or
      mmapping a file, as these are times when the content needs to be
      up-to-date.  Writes don't generally need the cache to be up-to-date, but
      writes beyond EOF can benefit, particularly in the posix_fallocate()
      case.
      
      So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
      i.e. when there is a gap between the end of the file and the start of
      the write.  If the cache is thought to be out of date (as happens after
      taking a file lock), this will cause a GETATTR, and the two flags
      mentioned above will be cleared.  With this, posix_fallocate() on a
      newly locked file does not generate excessive tiny writes.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6ba80d43
    • N
      NFS: invalidate file size when taking a lock. · 442ce049
      NeilBrown 提交于
      Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
      for writing"), NFS would revalidate, or invalidate, the file size when
      taking a lock.  Since that commit it only invalidates the file content.
      
      If the file size is changed on the server while wait for the lock, the
      client will have an incorrect understanding of the file size and could
      corrupt data.  This particularly happens when writing beyond the
      (supposed) end of file and can be easily be demonstrated with
      posix_fallocate().
      
      If an application opens an empty file, waits for a write lock, and then
      calls posix_fallocate(), glibc will determine that the underlying
      filesystem doesn't support fallocate (assuming version 4.1 or earlier)
      and will write out a '0' byte at the end of each 4K page in the region
      being fallocated that is after the end of the file.
      NFS will (usually) detect that these writes are beyond EOF and will
      expand them to cover the whole page, and then will merge the pages.
      Consequently, NFS will write out large blocks of zeroes beyond where it
      thought EOF was.  If EOF had moved, the pre-existing part of the file
      will be over-written.  Locking should have protected against this,
      but it doesn't.
      
      This patch restores the use of nfs_zap_caches() which invalidated the
      cached attributes.  When posix_fallocate() asks for the file size, the
      request will go to the server and get a correct answer.
      
      cc: stable@vger.kernel.org (v4.8+)
      Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      442ce049
  11. 27 4月, 2017 1 次提交
  12. 21 4月, 2017 2 次提交
  13. 25 2月, 2017 1 次提交
  14. 25 12月, 2016 1 次提交
  15. 20 12月, 2016 1 次提交
  16. 10 12月, 2016 1 次提交
    • A
      nfs_write_end(): fix handling of short copies · c0cf3ef5
      Al Viro 提交于
      What matters when deciding if we should make a page uptodate is
      not how much we _wanted_ to copy, but how much we actually have
      copied.  As it is, on architectures that do not zero tail on
      short copy we can leave uninitialized data in page marked uptodate.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c0cf3ef5
  17. 05 12月, 2016 1 次提交
  18. 06 10月, 2016 1 次提交
  19. 23 9月, 2016 1 次提交
  20. 20 9月, 2016 1 次提交
  21. 04 9月, 2016 1 次提交
  22. 20 7月, 2016 1 次提交
    • S
      sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e
      Scott Mayhew 提交于
      A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
      not really safe to use the the generic_cred->acred->ac_flags to store
      the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
      KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
      KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
      with the auth_cred to be in a state where they're perpetually doing 4K
      NFS_FILE_SYNC writes.
      
      This can be reproduced as follows:
      
      1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
      They do not need to be the same export, nor do they even need to be from
      the same NFS server.  Also, v3 is fine.
      $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
      $ sudo mount -o v3,sec=sys server2:/export /mnt/sys
      
      2. As the normal user, before accessing the kerberized mount, kinit with
      a short lifetime (but not so short that renewing the ticket would leave
      you within the 4-minute window again by the time the original ticket
      expires), e.g.
      $ kinit -l 10m -r 60m
      
      3. Do some I/O to the kerberized mount and verify that the writes are
      wsize, UNSTABLE:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      4. Wait until you're within 4 minutes of key expiry, then do some more
      I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
      set.  Verify that the writes are 4K, FILE_SYNC:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      5. Now do some I/O to the sec=sys mount.  This will cause
      RPC_CRED_NO_CRKEY_TIMEOUT to be set:
      $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
      
      6. Writes for that user will now be permanently 4K, FILE_SYNC for that
      user, regardless of which mount is being written to, until you reboot
      the client.  Renewing the kerberos ticket (assuming it hasn't already
      expired) will have no effect.  Grabbing a new kerberos ticket at this
      point will have no effect either.
      
      Move the flag to the auth->au_flags field (which is currently unused)
      and rename it slightly to reflect that it's no longer associated with
      the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
      rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
      add the inode to the arg list of nfs_ctx_key_to_expire so we can
      determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce52914e
  23. 06 7月, 2016 5 次提交
  24. 22 6月, 2016 4 次提交
  25. 02 5月, 2016 1 次提交
  26. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  27. 17 3月, 2016 1 次提交