1. 18 12月, 2019 1 次提交
  2. 05 12月, 2019 1 次提交
  3. 01 12月, 2019 1 次提交
  4. 24 11月, 2019 2 次提交
    • T
      GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads · 4d7cf69b
      Tim Smith 提交于
      [ Upstream commit 1eb8d7387908022951792a46fa040ad3942b3b08 ]
      
      Flushing the workqueue can cause operations to happen which might
      call gfs2_log_reserve(), or get stuck waiting for locks taken by such
      operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
      will never wake because the only thing which can wake it is gfs2_logd()
      which was already stopped.
      
      This causes umount of a gfs2 filesystem to wedge permanently if, for
      example, the umount immediately follows a large delete operation.
      
      When this occured, the following stack trace was obtained from the
      umount command
      
      [<ffffffff81087968>] flush_workqueue+0x1c8/0x520
      [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
      [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
      [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
      [<ffffffff811b7ff7>] kill_block_super+0x27/0x70
      [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
      [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
      [<ffffffff811b79b9>] deactivate_super+0x59/0x60
      [<ffffffff811d2998>] cleanup_mnt+0x58/0x80
      [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
      [<ffffffff8108c87d>] task_work_run+0x7d/0xa0
      [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
      [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
      [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
      [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NTim Smith <tim.smith@citrix.com>
      Signed-off-by: NMark Syms <mark.syms@citrix.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4d7cf69b
    • B
      gfs2: slow the deluge of io error messages · f3afad5d
      Bob Peterson 提交于
      [ Upstream commit b524abcc01483b2ac093cc6a8a2a7375558d2b64 ]
      
      When an io error is hit, it calls gfs2_io_error_bh_i for every
      journal buffer it can't write. Since we changed gfs2_io_error_bh_i
      recently to withdraw later in the cycle, it sends a flood of
      errors to the console. This patch checks for the file system already
      being withdrawn, and if so, doesn't send more messages. It doesn't
      stop the flood of messages, but it slows it down and keeps it more
      reasonable.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f3afad5d
  5. 21 11月, 2019 1 次提交
    • B
      gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated · 48b128cd
      Bob Peterson 提交于
      [ Upstream commit 4f36cb36c9d14340bb200d2ad9117b03ce992cfe ]
      
      The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when
      a rgrp buffer is valid. It's cleared when the glock is invalidated,
      signifying that the buffer data is now invalid. But before this
      patch, function update_rgrp_lvb was setting the flag when it
      determined it had a valid lvb. But that's an invalid assumption:
      just because you have a valid lvb doesn't mean you have valid
      buffers. After all, another node may have made the lvb valid,
      and this node just fetched it from the glock via dlm.
      
      Consider this scenario:
      1. The file system is mounted with RGRPLVB option.
      2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks
         to GL_SKIP, it skips the gfs2_rgrp_bh_get.
      3. Since loops == 0 and the allocation target (ap->target) is
         bigger than the largest known chunk of blocks in the rgrp
         (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses
         the call to gfs2_rgrp_bh_get there as well.
      4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses
         gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due
         to this invalid assumption.
      5. The next time update_rgrp_lvb is called, it sees the bit is set
         and just returns 0, assuming both the lvb and rgrp are both
         uptodate. But since this is a smaller allocation, or space has
         been freed by another node, thus adjusting the lvb values,
         it decides to use the rgrp for allocations, with invalid rd_free
         due to the fact it was never updated.
      
      This patch changes update_rgrp_lvb so it doesn't set the UPTODATE
      flag anymore. That way, it has no choice but to fetch the latest
      values.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      48b128cd
  6. 05 10月, 2019 1 次提交
  7. 16 8月, 2019 1 次提交
  8. 31 5月, 2019 3 次提交
    • A
      gfs2: Fix occasional glock use-after-free · c4b51dbc
      Andreas Gruenbacher 提交于
      [ Upstream commit 9287c6452d2b1f24ea8e84bd3cf6f3c6f267f712 ]
      
      This patch has to do with the life cycle of glocks and buffers.  When
      gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
      object is assigned to track the buffer, and that is queued to various
      lists, including the glock's gl_ail_list to indicate it's on the active
      items list.  Once the page associated with the buffer has been written,
      it is removed from the ail list, but its life isn't over until a revoke
      has been successfully written.
      
      So after the block is written, its bufdata object is moved from the
      glock's gl_ail_list to a file-system-wide list of pending revokes,
      sd_log_le_revoke.  At that point the glock still needs to track how many
      revokes it contributed to that list (in gl_revokes) so that things like
      glock go_sync can ensure all the metadata has been not only written, but
      also revoked before the glock is granted to a different node.  This is
      to guarantee journal replay doesn't replay the block once the glock has
      been granted to another node.
      
      Ross Lagerwall recently discovered a race in which an inode could be
      evicted, and its glock freed after its ail list had been synced, but
      while it still had unwritten revokes on the sd_log_le_revoke list.  The
      evict decremented the glock reference count to zero, which allowed the
      glock to be freed.  After the revoke was written, function
      revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
      and clear its GLF_LFLUSH flag, at which time it referenced the freed
      glock.
      
      This patch fixes the problem by incrementing the glock reference count
      in gfs2_add_revoke when the glock's first bufdata object is moved from
      the glock to the global revokes list. Later, when the glock's last such
      bufdata object is freed, the reference count is decremented. This
      guarantees that whichever process finishes last (the revoke writing or
      the evict) will properly free the glock, and neither will reference the
      glock after it has been freed.
      Reported-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c4b51dbc
    • R
      gfs2: Fix lru_count going negative · bac85208
      Ross Lagerwall 提交于
      [ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]
      
      Under certain conditions, lru_count may drop below zero resulting in
      a large amount of log spam like this:
      
      vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
          negative objects to delete nr=-1
      
      This happens as follows:
      1) A glock is moved from lru_list to the dispose list and lru_count is
         decremented.
      2) The dispose function calls cond_resched() and drops the lru lock.
      3) Another thread takes the lru lock and tries to add the same glock to
         lru_list, checking if the glock is on an lru list.
      4) It is on a list (actually the dispose list) and so it avoids
         incrementing lru_count.
      5) The glock is moved to lru_list.
      5) The original thread doesn't dispose it because it has been re-added
         to the lru list but the lru_count has still decreased by one.
      
      Fix by checking if the LRU flag is set on the glock rather than checking
      if the glock is on some list and rearrange the code so that the LRU flag
      is added/removed precisely when the glock is added/removed from lru_list.
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bac85208
    • A
      gfs2: Fix sign extension bug in gfs2_update_stats · fdc78eed
      Andreas Gruenbacher 提交于
      commit 5a5ec83d6ac974b12085cd99b196795f14079037 upstream.
      
      Commit 4d207133 changed the types of the statistic values in struct
      gfs2_lkstats from s64 to u64.  Because of that, what should be a signed
      value in gfs2_update_stats turned into an unsigned value.  When shifted
      right, we end up with a large positive value instead of a small negative
      value, which results in an incorrect variance estimate.
      
      Fixes: 4d207133 ("gfs2: Make statistics unsigned, suitable for use with do_div()")
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdc78eed
  9. 14 3月, 2019 1 次提交
  10. 07 2月, 2019 1 次提交
  11. 13 1月, 2019 2 次提交
  12. 01 12月, 2018 2 次提交
  13. 21 11月, 2018 2 次提交
    • A
      gfs2: Fix metadata read-ahead during truncate (2) · 55795dac
      Andreas Gruenbacher 提交于
      commit e7445ced upstream.
      
      The previous attempt to fix for metadata read-ahead during truncate was
      incorrect: for files with a height > 2 (1006989312 bytes with a block
      size of 4096 bytes), read-ahead requests were not being issued for some
      of the indirect blocks discovered while walking the metadata tree,
      leading to significant slow-downs when deleting large files.  Fix that.
      
      In addition, only issue read-ahead requests in the first pass through
      the meta-data tree, while deallocating data blocks.
      
      Fixes: c3ce5aa9 ("gfs2: Fix metadata read-ahead during truncate")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55795dac
    • A
      gfs2: Put bitmap buffers in put_super · 8793f67a
      Andreas Gruenbacher 提交于
      commit 10283ea5 upstream.
      
      gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects
      attached to the resource group glocks.  That function should release the
      buffers attached to the gfs2_bitmap objects (bi_bh), but the call to
      gfs2_rgrp_brelse for doing that is missing.
      
      When gfs2_releasepage later runs across these buffers which are still
      referenced, it refuses to free them.  This causes the pages the buffers
      are attached to to remain referenced as well.  With enough mount/unmount
      cycles, the system will eventually run out of memory.
      
      Fix this by adding the missing call to gfs2_rgrp_brelse in
      gfs2_clear_rgrpd.
      
      (Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.)
      
      Fixes: 39b0f1e9 ("GFS2: Don't brelse rgrp buffer_heads every allocation")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8793f67a
  14. 14 11月, 2018 1 次提交
  15. 12 10月, 2018 1 次提交
    • A
      gfs2: Fix iomap buffered write support for journaled files (2) · fee5150c
      Andreas Gruenbacher 提交于
      It turns out that the fix in commit 6636c3cc56 is bad; the assertion
      that the iomap code no longer creates buffer heads is incorrect for
      filesystems that set the IOMAP_F_BUFFER_HEAD flag.
      
      Instead, what's happening is that gfs2_iomap_begin_write treats all
      files that have the jdata flag set as journaled files, which is
      incorrect as long as those files are inline ("stuffed").  We're handling
      stuffed files directly via the page cache, which is why we ended up with
      pages without buffer heads in gfs2_page_add_databufs.
      
      Fix this by handling stuffed journaled files correctly in
      gfs2_iomap_begin_write.
      
      This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      fee5150c
  16. 10 10月, 2018 1 次提交
  17. 08 8月, 2018 1 次提交
  18. 07 8月, 2018 1 次提交
    • B
      gfs2: Fix gfs2_testbit to use clone bitmaps · dffe12a8
      Bob Peterson 提交于
      Function gfs2_testbit is called in three places. Two of those places,
      gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
      bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
      by the block reservations scheme to determine the length of an extent of
      free blocks. Before this patch, it wasn't using the clone bitmap, which
      means recently-freed blocks were treated as free blocks for the purposes
      of an allocation.
      
      This patch adds a new parameter to gfs2_testbit to indicate whether or
      not the clone bitmaps should be used (if available).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      dffe12a8
  19. 03 8月, 2018 1 次提交
  20. 27 7月, 2018 1 次提交
  21. 26 7月, 2018 1 次提交
    • A
      gfs2: Special-case rindex for gfs2_grow · 77612578
      Andreas Gruenbacher 提交于
      To speed up the common case of appending to a file,
      gfs2_write_alloc_required presumes that writing beyond the end of a file
      will always require additional blocks to be allocated.  This assumption
      is incorrect for preallocates files, but there are no negative
      consequences as long as *some* space is still left on the filesystem.
      
      One special file that always has some space preallocated beyond the end
      of the file is the rindex: when growing a filesystem, gfs2_grow adds one
      or more new resource groups and appends records describing those
      resource groups to the rindex; the preallocated space ensures that this
      is always possible.
      
      However, when a filesystem is completely full, gfs2_write_alloc_required
      will indicate that an additional allocation is required, and appending
      the next record to the rindex will fail even though space for that
      record has already been preallocated.  To fix that, skip the incorrect
      optimization in gfs2_write_alloc_required, but for the rindex only.
      Other writes to preallocated space beyond the end of the file are still
      allowed to fail on completely full filesystems.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      77612578
  22. 25 7月, 2018 9 次提交
    • B
      GFS2: rgrp free blocks used incorrectly · f6753df3
      Bob Peterson 提交于
      Before this patch, several functions in rgrp.c checked the value of
      rgd->rd_free_clone. That does not take into account blocks that were
      reserved by a multi-block reservation. This causes a problem when
      space gets tight in the file system. For example, when function
      gfs2_inplace_reserve checks to see if a rgrp has enough blocks to
      satisfy the request, it can accept a rgrp that it should reject
      because, although there are enough blocks to satisfy the request
      _now_, those blocks may be reserved for another running process.
      
      A second problem with this occurs when we've reserved the remaining
      blocks in an rgrp: function rg_mblk_search() can reject an rgrp
      improperly because it calculates:
      
         u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved;
      
      But rd_reserved includes blocks that the current process just
      reserved in its own call to inplace_reserve. For example, it can
      reserve the last 128 blocks of an rgrp, then reject that same rgrp
      because the above calculates out to free_blocks = 0;
      
      Consequences include, but are not limited to, (1) leaving holes,
      and thus increasing file system fragmentation, and (2) reporting
      file system is full long before it actually is.
      
      This patch introduces a new function, rgd_free, which returns the
      number of clone-free blocks (blocks that are truly free as opposed
      to blocks that are still being used because an unlinked file is
      still open) minus the number of blocks reserved by processes, but
      not counting the blocks we ourselves reserved (because obviously
      we need to allocate them).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f6753df3
    • C
      gfs2: remove redundant variable 'moved' · d1b0cb93
      Colin Ian King 提交于
      Variable 'moved' s being assigned but is never used hence it is
      redundant and can be removed.  This has been the case ever since commit
      c752666c.
      
      Cleans up clang warning:
      warning: variable 'moved' set but not used [-Wunused-but-set-variable]
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d1b0cb93
    • A
      gfs2: use iomap_readpage for blocksize == PAGE_SIZE · f95cbb44
      Andreas Gruenbacher 提交于
      We only use iomap_readpage for pages that don't have buffer heads
      attached yet: iomap_readpage would otherwise read pages from disk that
      are marked buffer_uptodate() but not PageUptodate().  Those pages may
      actually contain data more recent than what's on disk.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      f95cbb44
    • A
      gfs2: Use iomap for stuffed direct I/O reads · 1d45bb7f
      Andreas Gruenbacher 提交于
      Remove the fallback code from direct to buffered I/O for stuffed reads.
      
      For stuffed writes, we must keep the fallback code: the deferred glock
      we are holding under direct I/O doesn't allow to write to the inode or
      change the file size.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      1d45bb7f
    • A
      gfs2: fallocate_chunk: Always initialize struct iomap · c2589282
      Andreas Gruenbacher 提交于
      In fallocate_chunk, always initialize the iomap before calling
      gfs2_iomap_get_alloc: future changes could otherwise cause things like
      iomap.flags to leak across calls.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      c2589282
    • B
      GFS2: Fix recovery issues for spectators · 4a772772
      Bob Peterson 提交于
      This patch fixes a couple problems dealing with spectators who
      remain with gfs2 mounts after the last non-spectator node fails.
      
      Before this patch, spectator mounts would try to acquire the dlm's
      mounted lock EX as part of its normal recovery sequence.
      The mounted lock is only used to determine whether the node is
      the first mounter, the first node to mount the file system, for
      the purposes of file system recovery and journal replay.
      
      It's not necessary for spectators: they should never do journal
      recovery. If they acquire the lock it will prevent another "real"
      first-mounter from acquiring the lock in EX mode, which means it
      also cannot do journal recovery because it doesn't think it's the
      first node to mount the file system.
      
      This patch checks if the mounter is a spectator, and if so, avoids
      grabbing the mounted lock. This allows a secondary mounter who is
      really the first non-spectator mounter, to do journal recovery:
      since the spectator doesn't acquire the lock, it can grab it in
      EX mode, and therefore consider itself to be the first mounter
      both as a "real" first mount, and as a first-real-after-spectator.
      
      Note that the control lock still needs to be taken in PR mode
      in order to fetch the lvb value so it has the current status of
      all journal's recovery. This is used as it is today by a first
      mounter to replay the journals. For spectators, it's merely
      used to fetch the status bits. All recovery is bypassed and the
      node waits until recovery is completed by a non-spectator node.
      
      I also improved the cryptic message given by control_mount when
      a spectator is waiting for a non-spectator to perform recovery.
      
      It also fixes a problem in gfs2_recover_set whereby spectators
      were never queueing recovery work for their own journal.
      They cannot do recovery themselves, but they still need to queue
      the work so they can check the recovery bits and clear the
      DFL_BLOCK_LOCKS bit once the recovery happens on another node.
      
      When the work queue runs on a spectator, it bypasses most of the
      work so it won't print a bunch of annoying messages. All it will
      print is a bunch of messages that look like this until recovery
      completes on the non-spectator node:
      
      GFS2: fsid=mycluster:scratch.s: recover generation 3 jid 0
      GFS2: fsid=mycluster:scratch.s: recover jid 0 result busy
      
      These continue every 1.5 seconds until the recovery is done by
      the non-spectator, at which time it says:
      
      GFS2: fsid=mycluster:scratch.s: recover generation 4 done
      
      Then it proceeds with its mount.
      
      If the file system is mounted in spectator node and the last
      remaining non-spectator is fenced, any IO to the file system is
      blocked by dlm and the spectator waits until recovery is
      performed by a non-spectator.
      
      If a spectator tries to mount the file system before any
      non-spectators, it blocks and repeatedly gives this kernel
      message:
      
      GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
      GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4a772772
    • S
      fs: gfs2: Adding new return type vm_fault_t · 109dbb1e
      Souptick Joarder 提交于
      Use new return type vm_fault_t for gfs2_page_mkwrite
      handler.
      
      see commit 1c8f4220 ("mm: change return type to
      vm_fault_t") for reference.
      Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      109dbb1e
    • C
      gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr · 910f3d58
      Chengguang Xu 提交于
      It seems better to get size by calling posix_acl_xattr_size() instead of
      calling posix_acl_to_xattr() with NULL buffer argument.
      
      posix_acl_xattr_size() never returns 0, so remove the unnecessary check.
      Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      910f3d58
    • B
      gfs2: Don't reject a supposedly full bitmap if we have blocks reserved · e79e0e14
      Bob Peterson 提交于
      Before this patch, you could get into situations like this:
      
      1. Process 1 searches for X free blocks, finds them, makes a reservation
      2. Process 2 searches for free blocks in the same rgrp, but now the
         bitmap is full because process 1's reservation is skipped over.
         So it marks the bitmap as GBF_FULL.
      3. Process 1 tries to allocate blocks from its own reservation, but
         since the GBF_FULL bit is set, it skips over the rgrp and searches
         elsewhere, thus not using its own reservation.
      
      This patch adds an additional check to allow processes to use their
      own reservations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e79e0e14
  23. 12 7月, 2018 4 次提交