1. 29 7月, 2017 6 次提交
  2. 27 7月, 2017 3 次提交
    • N
      NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43
      NeilBrown 提交于
      posix_fallocate() will allocate space in an NFS file by considering
      the last byte of every 4K block.  If it is before EOF, it will read
      the byte and if it is zero, a zero is written out.  If it is after EOF,
      the zero is unconditionally written.
      
      For the blocks beyond EOF, if NFS believes its cache is valid, it will
      expand these writes to write full pages, and then will merge the pages.
      This results if (typically) 1MB writes.  If NFS believes its cache is
      not valid (particularly if NFS_INO_INVALID_DATA or
      NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
      send the individual 1-byte writes. This results in (typically) 256 times
      as many RPC requests, and can be substantially slower.
      
      Currently nfs_revalidate_mapping() is only used when reading a file or
      mmapping a file, as these are times when the content needs to be
      up-to-date.  Writes don't generally need the cache to be up-to-date, but
      writes beyond EOF can benefit, particularly in the posix_fallocate()
      case.
      
      So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
      i.e. when there is a gap between the end of the file and the start of
      the write.  If the cache is thought to be out of date (as happens after
      taking a file lock), this will cause a GETATTR, and the two flags
      mentioned above will be cleared.  With this, posix_fallocate() on a
      newly locked file does not generate excessive tiny writes.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6ba80d43
    • N
      NFS: invalidate file size when taking a lock. · 442ce049
      NeilBrown 提交于
      Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
      for writing"), NFS would revalidate, or invalidate, the file size when
      taking a lock.  Since that commit it only invalidates the file content.
      
      If the file size is changed on the server while wait for the lock, the
      client will have an incorrect understanding of the file size and could
      corrupt data.  This particularly happens when writing beyond the
      (supposed) end of file and can be easily be demonstrated with
      posix_fallocate().
      
      If an application opens an empty file, waits for a write lock, and then
      calls posix_fallocate(), glibc will determine that the underlying
      filesystem doesn't support fallocate (assuming version 4.1 or earlier)
      and will write out a '0' byte at the end of each 4K page in the region
      being fallocated that is after the end of the file.
      NFS will (usually) detect that these writes are beyond EOF and will
      expand them to cover the whole page, and then will merge the pages.
      Consequently, NFS will write out large blocks of zeroes beyond where it
      thought EOF was.  If EOF had moved, the pre-existing part of the file
      will be over-written.  Locking should have protected against this,
      but it doesn't.
      
      This patch restores the use of nfs_zap_caches() which invalidated the
      cached attributes.  When posix_fallocate() asks for the file size, the
      request will go to the server and get a correct answer.
      
      cc: stable@vger.kernel.org (v4.8+)
      Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      442ce049
    • A
      NFS: Use raw NFS access mask in nfs4_opendata_access() · 1e6f2095
      Anna Schumaker 提交于
      Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache") changed how the access results are stored after an
      access() call.  An NFS v4 OPEN might have access bits returned with the
      opendata, so we should use the NFS4_ACCESS values when determining the
      return value in nfs4_opendata_access().
      
      Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
      access cache")
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Tested-by: NTakashi Iwai <tiwai@suse.de>
      1e6f2095
  3. 26 7月, 2017 1 次提交
  4. 25 7月, 2017 1 次提交
  5. 24 7月, 2017 4 次提交
    • B
      xfs: fix quotacheck dquot id overflow infinite loop · cfaf2d03
      Brian Foster 提交于
      If a dquot has an id of U32_MAX, the next lookup index increment
      overflows the uint32_t back to 0. This starts the lookup sequence
      over from the beginning, repeats indefinitely and results in a
      livelock.
      
      Update xfs_qm_dquot_walk() to explicitly check for the lookup
      overflow and exit the loop.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      cfaf2d03
    • N
      btrfs: round down size diff when shrinking/growing device · 0e4324a4
      Nikolay Borisov 提交于
      Further testing showed that the fix introduced in 7dfb8be1 ("btrfs:
      Round down values which are written for total_bytes_size") was
      insufficient and it could still lead to discrepancies between the
      total_bytes in the super block and the device total bytes. So this patch
      also ensures that the difference between old/new sizes when
      shrinking/growing is also rounded down. This ensure that we won't be
      subtracting/adding a non-sectorsize multiples to the superblock/device
      total sizees.
      
      Fixes: 7dfb8be1 ("btrfs: Round down values which are written for total_bytes_size")
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0e4324a4
    • O
      Btrfs: fix early ENOSPC due to delalloc · 17024ad0
      Omar Sandoval 提交于
      If a lot of metadata is reserved for outstanding delayed allocations, we
      rely on shrink_delalloc() to reclaim metadata space in order to fulfill
      reservation tickets. However, shrink_delalloc() has a shortcut where if
      it determines that space can be overcommitted, it will stop early. This
      made sense before the ticketed enospc system, but now it means that
      shrink_delalloc() will often not reclaim enough space to fulfill any
      tickets, leading to an early ENOSPC. (Reservation tickets don't care
      about being able to overcommit, they need every byte accounted for.)
      
      Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims
      all of the metadata it is supposed to. This fixes early ENOSPCs we were
      seeing when doing a btrfs receive to populate a new filesystem, as well
      as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs.
      
      Fixes: 957780eb ("Btrfs: introduce ticketed enospc infrastructure")
      Tested-by: NChristoph Anton Mitterer <mail@christoph.anton.mitterer.name>
      Cc: stable@vger.kernel.org
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      17024ad0
    • J
      btrfs: fix lockup in find_free_extent with read-only block groups · 14443937
      Jeff Mahoney 提交于
      If we have a block group that is all of the following:
      1) uncached in memory
      2) is read-only
      3) has a disk cache state that indicates we need to recreate the cache
      
      AND the file system has enough free space fragmentation such that the
      request for an extent of a given size can't be honored;
      
      AND have a single CPU core;
      
      AND it's the block group with the highest starting offset such that
      there are no opportunities (like reading from disk) for the loop to
      yield the CPU;
      
      We can end up with a lockup.
      
      The root cause is simple.  Once we're in the position that we've read in
      all of the other block groups directly and none of those block groups
      can honor the request, there are no more opportunities to sleep.  We end
      up trying to start a caching thread which never gets run if we only have
      one core.  This *should* present as a hung task waiting on the caching
      thread to make some progress, but it doesn't.  Instead, it degrades into
      a busy loop because of the placement of the read-only check.
      
      During the first pass through the loop, block_group->cached will be set
      to BTRFS_CACHE_STARTED and have_caching_bg will be set.  Then we hit the
      read-only check and short circuit the loop.  We're not yet in
      LOOP_CACHING_WAIT, so we skip that loop back before going through the
      loop again for other raid groups.
      
      Then we move to LOOP_CACHING_WAIT state.
      
      During the this pass through the loop, ->cached will still be
      BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter
      cache_block_group, do a lot of nothing, and return, and also set
      have_caching_bg again.  Then we hit the read-only check and short circuit
      the loop.  The same thing happens as before except now we DO trigger
      the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the
      top.  We do this forever.
      
      There are two fixes in this patch since they address the same underlying
      bug.
      
      The first is to add a cond_resched to the end of the loop to ensure
      that the caching thread always has an opportunity to run.  This will
      fix the soft lockup issue, but find_free_extent will still loop doing
      nothing until the thread has completed.
      
      The second is to move the read-only check to the top of the loop.  We're
      never going to return an allocation within a read-only block group so
      we may as well skip it early.  The check for ->cached == BTRFS_CACHE_ERROR
      would cause the same problem except that BTRFS_CACHE_ERROR is considered
      a "done" state and we won't re-set have_caching_bg again.
      
      Many thanks to Stephan Kulow <coolo@suse.de> for his excellent help in
      the testing process.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      14443937
  6. 22 7月, 2017 1 次提交
  7. 21 7月, 2017 8 次提交
  8. 20 7月, 2017 9 次提交
  9. 19 7月, 2017 3 次提交
    • E
      jfs: preserve i_mode if __jfs_set_acl() fails · f070e5ac
      Ernesto A. Fernández 提交于
      When changing a file's acl mask, __jfs_set_acl() will first set the group
      bits of i_mode to the value of the mask, and only then set the actual
      extended attribute representing the new acl.
      
      If the second part fails (due to lack of space, for example) and the file
      had no acl attribute to begin with, the system will from now on assume
      that the mask permission bits are actual group permission bits, potentially
      granting access to the wrong users.
      
      Prevent this by only changing the inode mode after the acl has been set.
      Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      f070e5ac
    • J
      jfs: Don't clear SGID when inheriting ACLs · 9bcf66c7
      Jan Kara 提交于
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0'). However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of
      __jfs_set_acl() into jfs_set_acl(). That way the function will not be
      called when inheriting ACLs which is what we want as it prevents SGID
      bit clearing and the mode has been properly set by posix_acl_create()
      anyway.
      
      Fixes: 07393101
      CC: stable@vger.kernel.org
      CC: jfs-discussion@lists.sourceforge.net
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      9bcf66c7
    • J
      hfsplus: Don't clear SGID when inheriting ACLs · 84969465
      Jan Kara 提交于
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0'). However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by creating __hfsplus_set_posix_acl() function that does
      not call posix_acl_update_mode() and use it when inheriting ACLs. That
      prevents SGID bit clearing and the mode has been properly set by
      posix_acl_create() anyway.
      
      Fixes: 07393101
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      84969465
  10. 18 7月, 2017 4 次提交