1. 20 1月, 2020 2 次提交
  2. 15 1月, 2020 1 次提交
    • A
      gfs2: Avoid access time thrashing in gfs2_inode_lookup · 2b0fb353
      Andreas Gruenbacher 提交于
      In gfs2_inode_lookup, we initialize inode->i_atime to the lowest
      possibly value after gfs2_inode_refresh may already have been called.
      This should be the other way around, but we didn't notice because
      usually the inode type is known from the directory entry and so
      gfs2_inode_lookup won't call gfs2_inode_refresh.
      
      In addition, only initialize ip->i_no_formal_ino from no_formal_ino when
      actually needed.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      2b0fb353
  3. 09 1月, 2020 1 次提交
  4. 08 1月, 2020 1 次提交
  5. 07 1月, 2020 1 次提交
    • A
      gfs2: Another gfs2_find_jhead fix · eed0f953
      Andreas Gruenbacher 提交于
      On filesystems with a block size smaller than the page size,
      gfs2_find_jhead can split a page across two bios (for example, when
      blocks are not allocated consecutively).  When that happens, the first
      bio that completes will unlock the page in its bi_end_io handler even
      though the page hasn't been read completely yet.  Fix that by using a
      chained bio for the rest of the page.
      
      While at it, clean up the sector calculation logic in
      gfs2_log_alloc_bio.  In gfs2_find_jhead, simplify the disk block and
      offset calculation logic and fix a variable name.
      
      Fixes: f4686c26 ("gfs2: read journal in large chunks")
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      eed0f953
  6. 21 11月, 2019 2 次提交
  7. 20 11月, 2019 1 次提交
    • B
      gfs2: clean up iopen glock mess in gfs2_create_inode · 2c47c1be
      Bob Peterson 提交于
      Before this patch, gfs2_create_inode had a use-after-free for the
      iopen glock in some error paths because it did this:
      
      	gfs2_glock_put(io_gl);
      fail_gunlock2:
      	if (io_gl)
      		clear_bit(GLF_INODE_CREATING, &io_gl->gl_flags);
      
      In some cases, the io_gl was used for create and only had one
      reference, so the glock might be freed before the clear_bit().
      This patch tries to straighten it out by only jumping to the
      error paths where iopen is properly set, and moving the
      gfs2_glock_put after the clear_bit.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      2c47c1be
  8. 16 11月, 2019 2 次提交
    • B
      gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS · d99724c3
      Bob Peterson 提交于
      This patch closes a timing window in which two processes compete
      and overlap in the execution of do_xmote for the same glock:
      
                   Process A                              Process B
         ------------------------------------   -----------------------------
      1. Grabs gl_lockref and calls do_xmote
      2.                                        Grabs gl_lockref but is blocked
      3. Sets GLF_INVALIDATE_IN_PROGRESS
      4. Unlocks gl_lockref
      5.                                        Calls do_xmote
      6. Call glops->go_sync
      7. test_and_clear_bit GLF_DIRTY
      8. Call gfs2_log_flush                    Call glops->go_sync
      9. (slow IO, so it blocks a long time)    test_and_clear_bit GLF_DIRTY
                                                It's not dirty (step 7) returns
      10.                                       Tests GLF_INVALIDATE_IN_PROGRESS
      11.                                       Calls go_inval (rgrp_go_inval)
      12.                                       gfs2_rgrp_relse does brelse
      13.                                       truncate_inode_pages_range
      14.                                       Calls lm_lock UN
      
      In step 14 we've just told dlm to give the glock to another node
      when, in fact, process A has not finished the IO and synced all
      buffer_heads to disk and make sure their revokes are done.
      
      This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
      to use test_and_set_bit, and if the bit is already set, process B just
      ignores it and trusts that process A will do the do_xmote in the proper
      order.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d99724c3
    • B
      gfs2: Abort gfs2_freeze if io error is seen · 52b1cdcb
      Bob Peterson 提交于
      Before this patch, an io error, such as -EIO writing to the journal
      would cause function gfs2_freeze to go into an infinite loop,
      continuously retrying the freeze operation. But nothing ever clears
      the -EIO except unmount after withdraw, which is impossible if the
      freeze operation never ends (fails). Instead you get:
      
      [ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
      [ 6499.773058] gfs2: fsid=dm-32.0: retrying...
      [ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
      [ 6500.797015] gfs2: fsid=dm-32.0: retrying...
      
      This patch adds a check for -EIO in gfs2_freeze, and if seen, it
      dequeues the freeze glock, aborts the loop and returns the error.
      Also, there's no need to pass the freeze holder to function
      gfs2_lock_fs_check_clean since it's only called in one place and
      it's a well-known superblock pointer, so this simplifies that.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      52b1cdcb
  9. 15 11月, 2019 3 次提交
  10. 14 11月, 2019 2 次提交
  11. 12 11月, 2019 1 次提交
    • A
      gfs2: Remove active journal side effect from gfs2_write_log_header · 19ebc050
      Andreas Gruenbacher 提交于
      Function gfs2_write_log_header can be used to write a log header into any of
      the journals of a filesystem.  When used on the node's own journal,
      gfs2_write_log_header advances the current position in the log
      (sdp->sd_log_flush_head) as a side effect, through function gfs2_log_bmap.
      
      This is confusing, and it also means that we can't use gfs2_log_bmap for other
      journals even if they have an extent map.  So clean this mess up by not
      advancing sdp->sd_log_flush_head in gfs2_write_log_header or gfs2_log_bmap
      anymore and making that a responsibility of the callers instead.
      
      This is related to commit 7c70b896 ("gfs2: clean_journal improperly set
      sd_log_flush_head").
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      19ebc050
  12. 08 11月, 2019 3 次提交
    • A
      gfs2: Fix end-of-file handling in gfs2_page_mkwrite · 184b4e60
      Andreas Gruenbacher 提交于
      When the filesystem block size is smaller than the page size, the last
      page may contain blocks that lie entirely beyond the end of the file.
      Make sure to only allocate blocks that lie at least partially in the
      file.  Allocating blocks beyond that isn't useful, and what's more, they
      will not be zeroed out and may end up containing random data.
      
      With that change in place, make sure we'll still always unstuff stuffed
      inodes: iomap_writepage and iomap_writepages currently can't handle
      stuffed files.
      
      In addition, simplify and move the end-of-file check further to the top
      in gfs2_page_mkwrite to avoid weird side effects like unstuffing when
      we're not.
      
      Fixes xfstest generic/263.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      184b4e60
    • A
      gfs2: Multi-block allocations in gfs2_page_mkwrite · f53056c4
      Andreas Gruenbacher 提交于
      In gfs2_page_mkwrite's gfs2_allocate_page_backing helper, try to
      allocate as many blocks at once as we need.  Pass in the size of the
      requested allocation.
      
      Fixes: 35af80ae ("gfs2: don't use buffer_heads in gfs2_allocate_page_backing")
      Cc: stable@vger.kernel.org # v5.3+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f53056c4
    • A
      gfs2: Improve mmap write vs. punch_hole consistency · 39c3a948
      Andreas Gruenbacher 提交于
      When punching a hole in a file, use filemap_write_and_wait_range to
      write back any dirty pages in the range of the hole.  As a side effect,
      if the hole isn't page aligned, this marks unaligned pages at the
      beginning and the end of the hole read-only.  This is required when the
      block size is smaller than the page size: when those pages are written
      to again after the hole punching, we must make sure that page_mkwrite is
      called for those pages so that the page will be fully allocated and any
      blocks turned into holes from the hole punching will be reallocated.
      (If a page is writably mapped, page_mkwrite won't be called.)
      
      Fixes xfstest generic/567.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      39c3a948
  13. 30 10月, 2019 4 次提交
  14. 24 10月, 2019 1 次提交
  15. 23 10月, 2019 1 次提交
  16. 21 10月, 2019 1 次提交
  17. 15 10月, 2019 1 次提交
  18. 19 9月, 2019 1 次提交
  19. 17 9月, 2019 1 次提交
    • B
      gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps · f0b444b3
      Bob Peterson 提交于
      In function sweep_bh_for_rgrps, which is a helper for punch_hole,
      it uses variable buf_in_tr to keep track of when it needs to commit
      pending block frees on a partial delete that overflows the
      transaction created for the delete. The problem is that the
      variable was initialized at the start of function sweep_bh_for_rgrps
      but it was never cleared, even when starting a new transaction.
      
      This patch reinitializes the variable when the transaction is
      ended, so the next transaction starts out with it cleared.
      
      Fixes: d552a2b9 ("GFS2: Non-recursive delete")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f0b444b3
  20. 07 9月, 2019 1 次提交
    • A
      gfs2: Improve mmap write vs. truncate consistency · b473bc2d
      Andreas Gruenbacher 提交于
      On filesystems with a block size smaller than PAGE_SIZE, page_mkwrite is
      called for each memory-mapped page before that page can be written to.
      When such a memory-mapped file is truncated down to size x which is not
      a multiple of the page size and then back to a larger size, the page
      straddling size x can end up with a partial block mapping.  In that
      case, make sure to mark that page read-only so that page_mkwrite will be
      called before the page can be written to the next time.
      
      (There is no point in marking the page straddling size x read-only when
      truncating down as writing to memory beyond the end of the file will
      result in SIGBUS instead of growing the file.)
      
      Fixes xfstests generic/029, generic/030 on filesystems with a block size
      smaller than PAGE_SIZE.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b473bc2d
  21. 05 9月, 2019 5 次提交
    • B
      gfs2: Use async glocks for rename · ad26967b
      Bob Peterson 提交于
      Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
      reverse the roles of which directories are "old" and which are "new" for
      the purposes of rename. This can cause deadlocks where two nodes end up
      waiting for each other.
      
      There can be several layers of directory dependencies across many nodes.
      
      This patch fixes the problem by acquiring all gfs2_rename's inode glocks
      asychronously and waiting for all glocks to be acquired.  That way all
      inodes are locked regardless of the order.
      
      The timeout value for multiple asynchronous glocks is calculated to be
      the total of the individual wait times for each glock times two.
      
      Since gfs2_exchange is very similar to gfs2_rename, both functions are
      patched in the same way.
      
      A new async glock wait queue, sd_async_glock_wait, keeps a list of
      waiters for these events. If gfs2's holder_wake function detects an
      async holder, it wakes up any waiters for the event. The waiter only
      tests whether any of its requests are still pending.
      
      Since the glocks are sent to dlm asychronously, the wait function needs
      to check to see which glocks, if any, were granted.
      
      If a glock is granted by dlm (and therefore held), its minimum hold time
      is checked and adjusted as necessary, as other glock grants do.
      
      If the event times out, all glocks held thus far must be dequeued to
      resolve any existing deadlocks.  Then, if there are any outstanding
      locking requests, we need to loop around and wait for dlm to respond to
      those requests too.  After we release all requests, we return -ESTALE to
      the caller (vfs rename) which loops around and retries the request.
      
          Node1           Node2
          ---------       ---------
      1.  Enqueue A       Enqueue B
      2.  Enqueue B       Enqueue A
      3.  A granted
      6.                  B granted
      7.  Wait for B
      8.                  Wait for A
      9.                  A times out (since Node 1 holds A)
      10.                 Dequeue B (since it was granted)
      11.                 Wait for all requests from DLM
      12. B Granted (since Node2 released it in step 10)
      13. Rename
      14. Dequeue A
      15.                 DLM Grants A
      16.                 Dequeue A (due to the timeout and since we
                          no longer have B held for our task).
      17. Dequeue B
      18.                 Return -ESTALE to vfs
      19.                 VFS retries the operation, goto step 1.
      
      This release-all-locks / acquire-all-locks may slow rename / exchange
      down as both nodes struggle in the same way and do the same thing.
      However, this will only happen when there is contention for the same
      inodes, which ought to be rare.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      ad26967b
    • A
      gfs2: create function gfs2_glock_update_hold_time · 01123cf1
      Andreas Gruenbacher 提交于
      This patch moves the code that updates glock minimum hold
      time to a separate function. This will be called by a future
      patch.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      01123cf1
    • B
      gfs2: separate holder for rgrps in gfs2_rename · bc74aaef
      Bob Peterson 提交于
      Before this patch, gfs2_rename added a holder for the rgrp glock to
      its array of holders, ghs. There's nothing wrong with that, but this
      patch separates it into a separate holder. This is done to ensure
      it's always locked last as per the proper glock lock ordering,
      and also to pave the way for a future patch in which we will
      lock the non-rgrp glocks asynchronously.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      bc74aaef
    • M
      gfs2: Delete an unnecessary check before brelse() · bccaef90
      Markus Elfring 提交于
      The brelse() function tests whether its argument is NULL and then
      returns immediately.  Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      
      [The same applies to brelse() in gfs2_dir_no_add (which Coccinelle
      apparently missed), so fix that as well.]
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      bccaef90
    • A
      gfs2: Minor PAGE_SIZE arithmetic cleanups · 45eb0504
      Andreas Gruenbacher 提交于
      Replace divisions by PAGE_SIZE with shifts by PAGE_SHIFT and similar.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      45eb0504
  22. 03 9月, 2019 4 次提交