1. 27 10月, 2017 10 次提交
  2. 17 10月, 2017 1 次提交
    • B
      xfs: trim writepage mapping to within eof · 40214d12
      Brian Foster 提交于
      The writeback rework in commit fbcc0256 ("xfs: Introduce
      writeback context for writepages") introduced a subtle change in
      behavior with regard to the block mapping used across the
      ->writepages() sequence. The previous xfs_cluster_write() code would
      only flush pages up to EOF at the time of the writepage, thus
      ensuring that any pages due to file-extending writes would be
      handled on a separate cycle and with a new, updated block mapping.
      
      The updated code establishes a block mapping in xfs_writepage_map()
      that could extend beyond EOF if the file has post-eof preallocation.
      Because we now use the generic writeback infrastructure and pass the
      cached mapping to each writepage call, there is no implicit EOF
      limit in place. If eofblocks trimming occurs during ->writepages(),
      any post-eof portion of the cached mapping becomes invalid. The
      eofblocks code has no means to serialize against writeback because
      there are no pages associated with post-eof blocks. Therefore if an
      eofblocks trim occurs and is followed by a file-extending buffered
      write, not only has the mapping become invalid, but we could end up
      writing a page to disk based on the invalid mapping.
      
      Consider the following sequence of events:
      
      - A buffered write creates a delalloc extent and post-eof
        speculative preallocation.
      - Writeback starts and on the first writepage cycle, the delalloc
        extent is converted to real blocks (including the post-eof blocks)
        and the mapping is cached.
      - The file is closed and xfs_release() trims post-eof blocks. The
        cached writeback mapping is now invalid.
      - Another buffered write appends the file with a delalloc extent.
      - The concurrent writeback cycle picks up the just written page
        because the writeback range end is LLONG_MAX. xfs_writepage_map()
        attributes it to the (now invalid) cached mapping and writes the
        data to an incorrect location on disk (and where the file offset is
        still backed by a delalloc extent).
      
      This problem is reproduced by xfstests test generic/464, which
      triggers racing writes, appends, open/closes and writeback requests.
      
      To address this problem, trim the mapping used during writeback to
      within EOF when the mapping is validated. This ensures the mapping
      is revalidated for any pages encountered beyond EOF as of the time
      the current mapping was cached or last validated.
      Reported-by: NEryu Guan <eguan@redhat.com>
      Diagnosed-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      40214d12
  3. 12 10月, 2017 1 次提交
  4. 27 9月, 2017 1 次提交
  5. 02 9月, 2017 10 次提交
  6. 26 7月, 2017 1 次提交
  7. 21 7月, 2017 1 次提交
  8. 28 6月, 2017 1 次提交
  9. 19 6月, 2017 1 次提交
    • D
      xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent · e1a4e37c
      Darrick J. Wong 提交于
      In a pathological scenario where we are trying to bunmapi a single
      extent in which every other block is shared, it's possible that trying
      to unmap the entire large extent in a single transaction can generate so
      many EFIs that we overflow the transaction reservation.
      
      Therefore, use a heuristic to guess at the number of blocks we can
      safely unmap from a reflink file's data fork in an single transaction.
      This should prevent problems such as the log head slamming into the tail
      and ASSERTs that trigger because we've exceeded the transaction
      reservation.
      
      Note that since bunmapi can fail to unmap the entire range, we must also
      teach the deferred unmap code to roll into a new transaction whenever we
      get low on reservation.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch: random edits, all bugs are my fault]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e1a4e37c
  10. 17 5月, 2017 2 次提交
    • D
      xfs: fix warnings about unused stack variables · 6e747506
      Darrick J. Wong 提交于
      Reduce stack usage and get rid of compiler warnings by eliminating
      unused variables.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      6e747506
    • B
      xfs: fix indlen accounting error on partial delalloc conversion · 0daaecac
      Brian Foster 提交于
      The delalloc -> real block conversion path uses an incorrect
      calculation in the case where the middle part of a delalloc extent
      is being converted. This is documented as a rare situation because
      XFS generally attempts to maximize contiguity by converting as much
      of a delalloc extent as possible.
      
      If this situation does occur, the indlen reservation for the two new
      delalloc extents left behind by the conversion of the middle range
      is calculated and compared with the original reservation. If more
      blocks are required, the delta is allocated from the global block
      pool. This delta value can be characterized as the difference
      between the new total requirement (temp + temp2) and the currently
      available reservation minus those blocks that have already been
      allocated (startblockval(PREV.br_startblock) - allocated).
      
      The problem is that the current code does not account for previously
      allocated blocks correctly. It subtracts the current allocation
      count from the (new - old) delta rather than the old indlen
      reservation. This means that more indlen blocks than have been
      allocated end up stashed in the remaining extents and free space
      accounting is broken as a result.
      
      Fix up the calculation to subtract the allocated block count from
      the original extent indlen and thus correctly allocate the
      reservation delta based on the difference between the new total
      requirement and the unused blocks from the original reservation.
      Also remove a bogus assert that contradicts the fact that the new
      indlen reservation can be larger than the original indlen
      reservation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0daaecac
  11. 26 4月, 2017 8 次提交
  12. 09 3月, 2017 2 次提交
    • C
      xfs: try any AG when allocating the first btree block when reflinking · 2fcc319d
      Christoph Hellwig 提交于
      When a reflink operation causes the bmap code to allocate a btree block
      we're currently doing single-AG allocations due to having ->firstblock
      set and then try any higher AG due a little reflink quirk we've put in
      when adding the reflink code.  But given that we do not have a minleft
      reservation of any kind in this AG we can still not have any space in
      the same or higher AG even if the file system has enough free space.
      To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
      path instead.
      
      [And yes, we need to redo this properly instead of piling hacks over
       hacks.  I'm working on that, but it's not going to be a small series.
       In the meantime this fixes the customer reported issue]
      
      Also add a warning for failing allocations to make it easier to debug.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2fcc319d
    • B
      xfs: use iomap new flag for newly allocated delalloc blocks · f65e6fad
      Brian Foster 提交于
      Commit fa7f138a ("xfs: clear delalloc and cache on buffered write
      failure") fixed one regression in the iomap error handling code and
      exposed another. The fundamental problem is that if a buffered write
      is a rewrite of preexisting delalloc blocks and the write fails, the
      failure handling code can punch out preexisting blocks with valid
      file data.
      
      This was reproduced directly by sub-block writes in the LTP
      kernel/syscalls/write/write03 test. A first 100 byte write allocates
      a single block in a file. A subsequent 100 byte write fails and
      punches out the block, including the data successfully written by
      the previous write.
      
      To address this problem, update the ->iomap_begin() handler to
      distinguish newly allocated delalloc blocks from preexisting
      delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
      ->iomap_end() handler to decide when a failed or short write should
      punch out delalloc blocks.
      
      This introduces the subtle requirement that ->iomap_begin() should
      never combine newly allocated delalloc blocks with existing blocks
      in the resulting iomap descriptor. This can occur when a new
      delalloc reservation merges with a neighboring extent that is part
      of the current write, for example. Therefore, drop the
      post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
      just return the record inserted into the fork. This ensures only new
      blocks are returned and thus that preexisting delalloc blocks are
      always handled as "found" blocks and not punched out on a failed
      rewrite.
      Reported-by: NXiong Zhou <xzhou@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f65e6fad
  13. 17 2月, 2017 1 次提交
    • C
      xfs: tune down agno asserts in the bmap code · 410d17f6
      Christoph Hellwig 提交于
      In various places we currently assert that xfs_bmap_btalloc allocates
      from the same as the firstblock value passed in, unless it's either
      NULLAGNO or the dop_low flag is set.  But the reflink code does not
      fully follow this convention as it passes in firstblock purely as
      a hint for the allocator without actually having previous allocations
      in the transaction, and without having a minleft check on the current
      AG, leading to the assert firing on a very full and heavily used
      file system.  As even the reflink code only allocates from equal or
      higher AGs for now we can simply the check to always allow for equal
      or higher AGs.
      
      Note that we need to eventually split the two meanings of the firstblock
      value.  At that point we can also allow the reflink code to allocate
      from any AG instead of limiting it in any way.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      410d17f6