1. 15 10月, 2020 2 次提交
    • B
      gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe · 68942870
      Bob Peterson 提交于
      Before this patch, when blocks were freed, it called gfs2_meta_wipe to
      take the metadata out of the pending journal blocks. It did this mostly
      by calling another function called gfs2_remove_from_journal. This is
      shortsighted because it does not do anything with jdata blocks which
      may also be in the journal.
      
      This patch expands the function so that it wipes out jdata blocks from
      the journal as well, and it wipes it from the ail1 list if it hasn't
      been written back yet. Since it now processes jdata blocks as well,
      the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.
      
      New function gfs2_ail1_wipe wants a static view of the ail list, so it
      locks the sd_ail_lock when removing items. To accomplish this, function
      gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
      the caller's responsibility to do so.
      
      I was going to make sd_ail_lock locking conditional, but the practice is
      generally frowned upon. For details, see: https://lwn.net/Articles/109066/Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      68942870
    • A
      gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump · 0e539ca1
      Andrew Price 提交于
      When an rindex entry is found to be corrupt, compute_bitstructs() calls
      gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:
      
          gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);
      
      gfs2_rgrp_dump then dereferences the gl without checking it and we get
      
          BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280
      
      because there's no rgrp glock involved while reading the rindex on mount.
      
      Fix this by changing gfs2_rgrp_dump to take an rgrp argument.
      
      Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      0e539ca1
  2. 06 6月, 2020 1 次提交
    • A
      gfs2: Turn gl_delete into a delayed work · a0e3cc65
      Andreas Gruenbacher 提交于
      This requires flushing delayed work items in gfs2_make_fs_ro (which is called
      before unmounting a filesystem).
      
      When inodes are deleted and then recreated, pending gl_delete work items would
      have no effect because the inode generations will have changed, so we can
      cancel any pending gl_delete works before reusing iopen glocks.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a0e3cc65
  3. 28 3月, 2020 4 次提交
    • B
      gfs2: don't lock sd_log_flush_lock in try_rgrp_unlink · e04d339b
      Bob Peterson 提交于
      In function try_rgrp_unlink, we added a temporary lock of the
      sd_log_flush_lock while searching the bitmaps. This protected us from
      problems in which dinodes being freed were still in a state of flux
      because the rgrp was in an active transaction. It was a kludge.
      Now that we've straightened out the code for inode eviction, deletes,
      and all the recovery mess, we no longer need this kludge.
      This patch removes it, and should improve performance.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e04d339b
    • A
      gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put · 1595548f
      Andreas Gruenbacher 提交于
      Keeping reservations and quotas separate helps reviewing the code.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1595548f
    • B
      gfs2: Change inode qa_data to allow multiple users · 2fba46a0
      Bob Peterson 提交于
      Before this patch, multiple users called gfs2_qa_alloc which allocated
      a qadata structure to the inode, if quotas are turned on. Later, in
      file close or evict, the structure was deleted with gfs2_qa_delete.
      But there can be several competing processes who need access to the
      structure. There were races between file close (release) and the others.
      Thus, a release could delete the structure out from under a process
      that relied upon its existence. For example, chown.
      
      This patch changes the management of the qadata structures to be
      a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
      and if the structure is allocated, the count essentially starts out at
      1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
      last guy to decrement the count to 0 frees the memory.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      2fba46a0
    • B
      gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc · d580712a
      Bob Peterson 提交于
      Before this patch, multiple callers called gfs2_rsqa_alloc to force
      the existence of a reservations structure and a quota data structure
      if needed. However, now the reservations are handled separately, so
      the quota data is only the quota data. So we eliminate the one in
      favor of just calling gfs2_qa_alloc directly.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      d580712a
  4. 10 2月, 2020 2 次提交
    • B
      gfs2: Rework how rgrp buffer_heads are managed · b3422cac
      Bob Peterson 提交于
      Before this patch, the rgrp code had a serious problem related to
      how it managed buffer_heads for resource groups. The problem caused
      file system corruption, especially in cases of journal replay.
      
      When an rgrp glock was demoted to transfer ownership to a
      different cluster node, do_xmote() first calls rgrp_go_sync and then
      rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called
      gfs2_rgrp_brelse() that dropped the buffer_head reference count.
      In most cases, the reference count went to zero, which is right.
      However, there were other places where the buffers are handled
      differently.
      
      After rgrp_go_sync, do_xmote called rgrp_go_inval which called
      gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to
      truncate_inode_pages_range would get rid of the pages in memory,
      but only if the reference count drops to 0.
      
      Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL.
      So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer
      to the buffer_heads in cases where the reference count was still 1.
      Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time,
      it failed the check for "if (bi->bi_bh)" and thus failed to call
      brelse a second time. Because of that, the reference count on those
      buffers sometimes failed to drop from 1 to 0. And that caused
      function truncate_inode_pages_range to keep the pages in page cache
      rather than freeing them.
      
      The next time the rgrp glock was acquired, the metadata read of
      the rgrp buffers re-used the pages in memory, which were now
      wrong because they were likely modified by the other node who
      acquired the glock in EX (which is why we demoted the glock).
      This re-use of the page cache caused corruption because changes
      made by the other nodes were never seen, so the bitmaps were
      inaccurate.
      
      For some reason, the problem became most apparent when journal
      replay forced the replay of rgrps in memory, which caused newer
      rgrp data to be overwritten by the older in-core pages.
      
      A big part of the problem was that the rgrp buffer were released
      in multiple places: The go_unlock function would release them when
      the glock was released rather than when the glock is demoted,
      which is clearly wrong because our intent was to cache them until
      the glock is demoted from SH or EX.
      
      This patch attempts to clean up the mess and make one consistent
      and centralized mechanism for managing the rgrp buffer_heads by
      implementing several changes:
      
      1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync.
         We don't want to release the buffers or zero the pointers when
         syncing for the reasons stated above. It only makes sense to
         release them when the glock is actually invalidated (go_inval).
         And when we do, then we set the bh pointers to NULL.
      2. The go_unlock function (which was only used for rgrps) is
         eliminated, as we've talked about doing many times before.
         The go_unlock function was called too early in the glock dq
         process, and should not happen until the glock is invalidated.
      3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd.
         That will now happen automatically when the rgrp glocks are
         demoted, and shouldn't happen any sooner or later than that.
         Instead, function gfs2_clear_rgrpd has been modified to demote
         the rgrp glocks, and therefore, free those pages, before the
         remaining glocks are culled by gfs2_gl_hash_clear. This
         prevents the gl_object from hanging around when the glocks are
         culled.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b3422cac
    • A
      gfs2: Report errors before withdraw · 8dc88ac6
      Andreas Gruenbacher 提交于
      In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before
      withdrawing the filesystem: otherwise, when we withdraw first and withdraw is
      configured to panic, we'll never get to the error reporting.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      8dc88ac6
  5. 21 1月, 2020 1 次提交
  6. 03 9月, 2019 1 次提交
  7. 28 6月, 2019 2 次提交
  8. 05 6月, 2019 1 次提交
  9. 08 5月, 2019 2 次提交
    • A
      gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke} · fbb27873
      Andreas Gruenbacher 提交于
      Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no
      such thing as an "unrevoke" object; all this function does is remove
      existing revoke objects plus some bookkeeping.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      fbb27873
    • A
      gfs2: Fix loop in gfs2_rbm_find (v2) · 71921ef8
      Andreas Gruenbacher 提交于
      Fix the resource group wrap-around logic in gfs2_rbm_find that commit
      e579ed4f broke.  The bug can lead to unnecessary repeated scanning of the
      same bitmaps; there is a risk that future changes will turn this into an
      endless loop.
      
      This is an updated version of commit 2d29f6b9 ("gfs2: Fix loop in
      gfs2_rbm_find") which ended up being reverted because it introduced a
      performance regression in iozone (see commit e74c98ca).  Changes since v1:
      
       - Simplify the wrap-around logic.
      
       - Handle the case where each resource group only has a single bitmap block
         (small filesystem).
      
       - Update rd_extfail_pt whenever we scan the entire bitmap, even when we don't
         start the scan at the very beginning of the bitmap.
      
      Fixes: e579ed4f ("GFS2: Introduce rbm field bii")
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      71921ef8
  10. 01 2月, 2019 1 次提交
  11. 12 12月, 2018 2 次提交
  12. 09 11月, 2018 1 次提交
    • A
      gfs2: Put bitmap buffers in put_super · 10283ea5
      Andreas Gruenbacher 提交于
      gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects
      attached to the resource group glocks.  That function should release the
      buffers attached to the gfs2_bitmap objects (bi_bh), but the call to
      gfs2_rgrp_brelse for doing that is missing.
      
      When gfs2_releasepage later runs across these buffers which are still
      referenced, it refuses to free them.  This causes the pages the buffers
      are attached to to remain referenced as well.  With enough mount/unmount
      cycles, the system will eventually run out of memory.
      
      Fix this by adding the missing call to gfs2_rgrp_brelse in
      gfs2_clear_rgrpd.
      
      (Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.)
      
      Fixes: 39b0f1e9 ("GFS2: Don't brelse rgrp buffer_heads every allocation")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      10283ea5
  13. 12 10月, 2018 9 次提交
  14. 06 10月, 2018 1 次提交
  15. 29 8月, 2018 2 次提交
    • B
      gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated · 4f36cb36
      Bob Peterson 提交于
      The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when
      a rgrp buffer is valid. It's cleared when the glock is invalidated,
      signifying that the buffer data is now invalid. But before this
      patch, function update_rgrp_lvb was setting the flag when it
      determined it had a valid lvb. But that's an invalid assumption:
      just because you have a valid lvb doesn't mean you have valid
      buffers. After all, another node may have made the lvb valid,
      and this node just fetched it from the glock via dlm.
      
      Consider this scenario:
      1. The file system is mounted with RGRPLVB option.
      2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks
         to GL_SKIP, it skips the gfs2_rgrp_bh_get.
      3. Since loops == 0 and the allocation target (ap->target) is
         bigger than the largest known chunk of blocks in the rgrp
         (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses
         the call to gfs2_rgrp_bh_get there as well.
      4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses
         gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due
         to this invalid assumption.
      5. The next time update_rgrp_lvb is called, it sees the bit is set
         and just returns 0, assuming both the lvb and rgrp are both
         uptodate. But since this is a smaller allocation, or space has
         been freed by another node, thus adjusting the lvb values,
         it decides to use the rgrp for allocations, with invalid rd_free
         due to the fact it was never updated.
      
      This patch changes update_rgrp_lvb so it doesn't set the UPTODATE
      flag anymore. That way, it has no choice but to fetch the latest
      values.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      4f36cb36
    • B
      gfs2: improve debug information when lvb mismatches are found · 72244b6b
      Bob Peterson 提交于
      Before this patch, gfs2_rgrp_bh_get would check for lvb mismatches,
      but it wouldn't tell you what was actually wrong. This patch adds
      more information to help us debug it. It also makes rgrp consistency
      checks dump any bad rgrps, and the rgrp dump code dump any lvbs
      as well as the rgrp itself.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      72244b6b
  16. 08 8月, 2018 1 次提交
  17. 07 8月, 2018 1 次提交
    • B
      gfs2: Fix gfs2_testbit to use clone bitmaps · dffe12a8
      Bob Peterson 提交于
      Function gfs2_testbit is called in three places. Two of those places,
      gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
      bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
      by the block reservations scheme to determine the length of an extent of
      free blocks. Before this patch, it wasn't using the clone bitmap, which
      means recently-freed blocks were treated as free blocks for the purposes
      of an allocation.
      
      This patch adds a new parameter to gfs2_testbit to indicate whether or
      not the clone bitmaps should be used (if available).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      dffe12a8
  18. 27 7月, 2018 1 次提交
  19. 25 7月, 2018 2 次提交
    • B
      GFS2: rgrp free blocks used incorrectly · f6753df3
      Bob Peterson 提交于
      Before this patch, several functions in rgrp.c checked the value of
      rgd->rd_free_clone. That does not take into account blocks that were
      reserved by a multi-block reservation. This causes a problem when
      space gets tight in the file system. For example, when function
      gfs2_inplace_reserve checks to see if a rgrp has enough blocks to
      satisfy the request, it can accept a rgrp that it should reject
      because, although there are enough blocks to satisfy the request
      _now_, those blocks may be reserved for another running process.
      
      A second problem with this occurs when we've reserved the remaining
      blocks in an rgrp: function rg_mblk_search() can reject an rgrp
      improperly because it calculates:
      
         u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved;
      
      But rd_reserved includes blocks that the current process just
      reserved in its own call to inplace_reserve. For example, it can
      reserve the last 128 blocks of an rgrp, then reject that same rgrp
      because the above calculates out to free_blocks = 0;
      
      Consequences include, but are not limited to, (1) leaving holes,
      and thus increasing file system fragmentation, and (2) reporting
      file system is full long before it actually is.
      
      This patch introduces a new function, rgd_free, which returns the
      number of clone-free blocks (blocks that are truly free as opposed
      to blocks that are still being used because an unlinked file is
      still open) minus the number of blocks reserved by processes, but
      not counting the blocks we ourselves reserved (because obviously
      we need to allocate them).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f6753df3
    • B
      gfs2: Don't reject a supposedly full bitmap if we have blocks reserved · e79e0e14
      Bob Peterson 提交于
      Before this patch, you could get into situations like this:
      
      1. Process 1 searches for X free blocks, finds them, makes a reservation
      2. Process 2 searches for free blocks in the same rgrp, but now the
         bitmap is full because process 1's reservation is skipped over.
         So it marks the bitmap as GBF_FULL.
      3. Process 1 tries to allocate blocks from its own reservation, but
         since the GBF_FULL bit is set, it skips over the rgrp and searches
         elsewhere, thus not using its own reservation.
      
      This patch adds an additional check to allow processes to use their
      own reservations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e79e0e14
  20. 05 7月, 2018 2 次提交
  21. 21 6月, 2018 1 次提交