1. 11 1月, 2022 1 次提交
  2. 05 12月, 2021 4 次提交
  3. 02 12月, 2021 4 次提交
    • A
      gfs2: gfs2_create_inode rework · 3d36e57f
      Andreas Gruenbacher 提交于
      When gfs2_lookup_by_inum() calls gfs2_inode_lookup() for an uncached
      inode, gfs2_inode_lookup() will place a new tentative inode into the
      inode cache before verifying that there is a valid inode at the given
      address.  This can race with gfs2_create_inode() which doesn't check for
      duplicates inodes.  gfs2_create_inode() will try to assign the new inode
      to the corresponding inode glock, and glock_set_object() will complain
      that the glock is still in use by gfs2_inode_lookup's tentative inode.
      
      We noticed this bug after adding commit 486408d6 ("gfs2: Cancel
      remote delete work asynchronously") which allowed delete_work_func() to
      race with gfs2_create_inode(), but the same race exists for
      open-by-handle.
      
      Fix that by switching from insert_inode_hash() to
      insert_inode_locked4(), which does check for duplicate inodes.  We know
      we've just managed to to allocate the new inode, so an inode tentatively
      created by gfs2_inode_lookup() will eventually go away and
      insert_inode_locked4() will always succeed.
      
      In addition, don't flush the inode glock work anymore (this can now only
      make things worse) and clean up glock_{set,clear}_object for the inode
      glock somewhat.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      3d36e57f
    • A
      gfs2: gfs2_inode_lookup rework · 5f6e13ba
      Andreas Gruenbacher 提交于
      Rework gfs2_inode_lookup() to only set up the new inode's glocks after
      verifying that the new inode is valid.
      
      There is no need for flushing the inode glock work queue anymore now,
      so remove that as well.
      
      While at it, get rid of the useless wrapper around iget5_locked() and
      its unnecessary is_bad_inode() check.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      5f6e13ba
    • A
      gfs2: gfs2_inode_lookup cleanup · b8e12e35
      Andreas Gruenbacher 提交于
      In gfs2_inode_lookup, once the inode has been looked up, we check if the
      inode generation (no_formal_ino) is the one we're looking for.  If it
      isn't and the inode wasn't in the inode cache, we discard the newly
      looked up inode.  This is unnecessary, complicates the code, and makes
      future changes to gfs2_inode_lookup harder, so change the code to retain
      newly looked up inodes instead.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b8e12e35
    • A
      gfs2: Fix remote demote of weak glock holders · e11b02df
      Andreas Gruenbacher 提交于
      When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in
      response to a remote locking conflict, we don't set the HIF_HOLDER flag.  This
      causes function may_grant to BUG.  Fix by setting the missing HIF_HOLDER flag
      in the mock glock holder.
      
      In addition, define the mock glock holder where it is used.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e11b02df
  4. 11 11月, 2021 1 次提交
    • A
      gfs2: Prevent endless loops in gfs2_file_buffered_write · 554c577c
      Andreas Gruenbacher 提交于
      Currently, instead of performing a short write,
      iomap_file_buffered_write will fail when part of its iov iterator cannot
      be read.  In contrast, gfs2_file_buffered_write will loop around if it
      can read part of the iov iterator, so we can end up in an endless loop.
      
      This should be fixed in iomap_file_buffered_write (and also
      generic_perform_write), but this comes a bit late in the 5.16
      development cycle, so work around it in the filesystem by
      trimming the iov iterator to the known-good size for now.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      554c577c
  5. 08 11月, 2021 1 次提交
  6. 06 11月, 2021 3 次提交
    • A
      gfs2: Fix length of holes reported at end-of-file · f3506eee
      Andreas Gruenbacher 提交于
      Fix the length of holes reported at the end of a file: the length is
      relative to the beginning of the extent, not the seek position which is
      rounded down to the filesystem block size.
      
      This bug went unnoticed for some time, but is now caught by the
      following assertion in iomap_iter_done():
      
        WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos)
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f3506eee
    • B
      gfs2: release iopen glock early in evict · 49462e2b
      Bob Peterson 提交于
      Before this patch, evict would clear the iopen glock's gl_object after
      releasing the inode glock.  In the meantime, another process could reuse
      the same block and thus glocks for a new inode.  It would lock the inode
      glock (exclusively), and then the iopen glock (shared).  The shared
      locking mode doesn't provide any ordering against the evict, so by the
      time the iopen glock is reused, evict may not have gotten to setting
      gl_object to NULL.
      
      Fix that by releasing the iopen glock before the inode glock in
      gfs2_evict_inode.
      
      Signed-off-by: Bob Peterson <rpeterso@redhat.com>gl_object
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      49462e2b
    • A
      gfs2: Fix atomic bug in gfs2_instantiate · 7a92deaa
      Andreas Gruenbacher 提交于
      Replace test_bit() + set_bit() with test_and_set_bit() where we need an atomic
      operation.  Use clear_and_wake_up_bit() instead of open coding it.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      7a92deaa
  7. 03 11月, 2021 1 次提交
  8. 25 10月, 2021 22 次提交
    • T
      gfs2: Fix unused value warning in do_gfs2_set_flags() · e34e6f81
      Tim Gardner 提交于
      Coverity complains of an unused value:
      
      CID 119623 (#1 of 1): Unused value (UNUSED_VALUE)
      assigned_value: Assigning value -1 to error here, but that stored value is
      overwritten before it can be used.
      237        error = -EPERM;
      
      Fix it by removing the assignment.
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e34e6f81
    • A
      gfs2: check context in gfs2_glock_put · 660a6126
      Alexander Aring 提交于
      Add a might_sleep call into gfs2_glock_put which can sleep in DLM when
      the last reference is released.  This will show problems earlier, and
      not only when the last reference is put.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      660a6126
    • A
      gfs2: Fix glock_hash_walk bugs · 7427f3bb
      Andreas Gruenbacher 提交于
      So far, glock_hash_walk took a reference on each glock it iterated over, and it
      was the examiner's responsibility to drop those references.  Dropping the final
      reference to a glock can sleep and the examiners are called in a RCU critical
      section with spin locks held, so examiners that didn't need the extra reference
      had to drop it asynchronously via gfs2_glock_queue_put or similar.  This wasn't
      done correctly in thaw_glock which did call gfs2_glock_put, and not at all in
      dump_glock_func.
      
      Change glock_hash_walk to not take glock references at all.  That way, the
      examiners that don't need them won't have to bother with slow asynchronous
      puts, and the examiners that do need references can take them themselves.
      Reported-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      7427f3bb
    • A
      gfs2: Cancel remote delete work asynchronously · 486408d6
      Andreas Gruenbacher 提交于
      In gfs2_inode_lookup and gfs2_create_inode, we're calling
      gfs2_cancel_delete_work which currently cancels any remote delete work
      (delete_work_func) synchronously.  This means that if the work is
      currently running, it will wait for it to finish.  We're doing this to
      pevent a previous instance of an inode from having any influence on the
      next instance.
      
      However, delete_work_func uses gfs2_inode_lookup internally, and we can
      end up in a deadlock when delete_work_func gets interrupted at the wrong
      time.  For example,
      
        (1) An inode's iopen glock has delete work queued, but the inode
            itself has been evicted from the inode cache.
      
        (2) The delete work is preempted before reaching gfs2_inode_lookup.
      
        (3) Another process recreates the inode (gfs2_create_inode).  It tries
            to cancel any outstanding delete work, which blocks waiting for
            the ongoing delete work to finish.
      
        (4) The delete work calls gfs2_inode_lookup, which blocks waiting for
            gfs2_create_inode to instantiate and unlock the new inode =>
            deadlock.
      
      It turns out that when the delete work notices that its inode has been
      re-instantiated, it will do nothing.  This means that it's safe to
      cancel the delete work asynchronously.  This prevents the kind of
      deadlock described above.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      486408d6
    • B
      gfs2: set glock object after nq · 8793e149
      Bob Peterson 提交于
      Before this patch, function gfs2_create_inode called glock_set_object to
      set the gl_object for inode and iopen glocks before the glock was locked.
      That's wrong because other competing processes like evict may be
      blocked waiting for the glock and still have gl_object set before the
      actual eviction can take place.
      
      This patch moves the call to glock_set_object until after the glock is
      acquire in function gfs2_create_inode, so it waits for possibly
      competing evicts to finish their processing first.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      8793e149
    • B
      gfs2: remove RDF_UPTODATE flag · 4b3113a2
      Bob Peterson 提交于
      The new GLF_INSTANTIATE_NEEDED flag obsoletes the old rgrp flag
      GFS2_RDF_UPTODATE, so this patch replaces it like we did with inodes.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4b3113a2
    • B
      gfs2: Eliminate GIF_INVALID flag · ec1d398d
      Bob Peterson 提交于
      With the addition of the new GLF_INSTANTIATE_NEEDED flag, the
      GIF_INVALID flag is now redundant. This patch removes it.
      Since inode_instantiate is only called when instantiation is needed,
      the check in inode_instantiate is removed too.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      ec1d398d
    • B
      gfs2: fix GL_SKIP node_scope problems · f2e70d8f
      Bob Peterson 提交于
      Before this patch, when a glock was locked, the very first holder on the
      queue would unlock the lockref and call the go_instantiate glops function
      (if one existed), unless GL_SKIP was specified. When we introduced the new
      node-scope concept, we allowed multiple holders to lock glocks in EX mode
      and share the lock.
      
      But node-scope introduced a new problem: if the first holder has GL_SKIP
      and the next one does NOT, since it is not the first holder on the queue,
      the go_instantiate op was not called. Eventually the GL_SKIP holder may
      call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
      still a window of time in which another non-GL_SKIP holder assumes the
      instantiate function had been called by the first holder. In the case of
      rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.
      
      This patch tries to fix the problem by introducing two new glock flags:
      
      GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
      needs to be called to "fill in" or "read in" the object before it is
      referenced.
      
      GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
      in the process of reading in the object. Whenever a function needs to
      reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
      set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
      function.
      
      As before, the gl_lockref spin_lock is unlocked during the IO operation,
      which may take a relatively long amount of time to complete. While
      unlocked, if another process determines go_instantiate is still needed,
      it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
      glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
      it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
      go_instantiate operation may not have been successful.
      
      Functions that previously called the instantiate sub-functions now call
      directly into gfs2_instantiate so the new bits are managed properly.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f2e70d8f
    • B
      gfs2: split glock instantiation off from do_promote · e6f85600
      Bob Peterson 提交于
      Before this patch, function do_promote had a section of code that did
      the actual instantiation.  This patch splits that off into its own
      function, gfs2_instantiate, which prepares us for the next patch that
      will use that function.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e6f85600
    • B
      gfs2: further simplify do_promote · 60d8bae9
      Bob Peterson 提交于
      This patch further simplifies function do_promote by eliminating some
      redundant code in favor of using a lock_released flag. This is just
      prep work for a future patch.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      60d8bae9
    • B
      gfs2: re-factor function do_promote · 17a6ecee
      Bob Peterson 提交于
      This patch simply re-factors function do_promote to reduce the indents.
      The logic should be unchanged. This makes future patches more readable.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      17a6ecee
    • A
      gfs2: Remove 'first' trace_gfs2_promote argument · d74d0ce5
      Andreas Gruenbacher 提交于
      Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the
      'first' holder isn't the one that instantiates the glock
      (gl_instantiate), which is what the 'first' flag was apparently supposed
      to indicate.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d74d0ce5
    • B
      gfs2: change go_lock to go_instantiate · 3278b977
      Bob Peterson 提交于
      Before this patch, the go_lock glock operations (glops) did not do
      any actual locking. They were used to instantiate objects, like reading
      in dinodes and rgrps from the media.
      
      This patch renames the functions to go_instantiate for clarity.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      3278b977
    • B
      gfs2: dump glocks from gfs2_consist_OBJ_i · a739765c
      Bob Peterson 提交于
      Before this patch, failed consistency checks printed out the object
      that failed, but not the object's glock. This patch makes it also
      print out the object glock so we can see the glock's holders and flags
      to aid with debugging.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a739765c
    • B
      gfs2: dequeue iopen holder in gfs2_inode_lookup error · 763766c0
      Bob Peterson 提交于
      Before this patch, if function gfs2_inode_lookup encountered an error
      after it had locked the iopen glock, it never unlocked it, relying on
      the evict code to do the cleanup.  The evict code then took the
      inode glock while holding the iopen glock, which violates the locking
      order.  For example,
      
       (1) node A does a gfs2_inode_lookup that fails, leaving the iopen glock
           locked.
      
       (2) node B calls delete_work_func -> gfs2_lookup_by_inum ->
           gfs2_inode_lookup.  It locks the inode glock and blocks trying to
           lock the iopen glock, which is held by node A.
      
       (3) node A eventually calls gfs2_evict_inode -> evict_should_delete.
           It blocks trying to lock the inode glock, which is now held by
           node B.
      
      This patch introduces error handling to function gfs2_inode_lookup
      so it properly dequeues held iopen glocks on errors.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      763766c0
    • A
      gfs2: Save ip from gfs2_glock_nq_init · b016d9a8
      Andreas Gruenbacher 提交于
      Before this patch, when a glock was locked by function gfs2_glock_nq_init,
      it initialized the holder gh_ip (return address) as gfs2_glock_nq_init.
      That made it extremely difficult to track down problems because many
      functions call gfs2_glock_nq_init. This patch changes the function so
      that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes
      it easy to backtrack which holder took the lock.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b016d9a8
    • B
      gfs2: Allow append and immutable bits to coexist · a500bd31
      Bob Peterson 提交于
      Before this patch, function do_gfs2_set_flags checked if the append
      and immutable flags were being set while already set. If so, error -EPERM
      was given. There's no reason why these two flags should be mutually
      exclusive, and if you set them separately, you will, in essence, set
      one while it is already set. For example:
      
      chattr +a /mnt/gfs2/file1
      chattr +i /mnt/gfs2/file1
      
      The first command sets the append-only flag. Since they are additive,
      the second command sets the immutable flag AND append-only flag,
      since they both coexist in i_diskflags. So the second command should
      not return an error. This bug caused xfstests generic/545 to fail.
      
      This patch simply removes the invalid checks.
      I also eliminated an unused parm from do_gfs2_set_flags.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a500bd31
    • B
      gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug · c98c2ca5
      Bob Peterson 提交于
      In rgrp.c, there are several places where it does BUG_ON. This tells us
      the call stack but nothing more, which is not very helpful.
      This patch switches them to GLOCK_BUG_ON which also prints the glock,
      its holders, and many of the rgrp values, which will help us debug
      problems in the future.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c98c2ca5
    • B
      gfs2: move GL_SKIP check from glops to do_promote · c1442f6b
      Bob Peterson 提交于
      Before this patch, each individual "go_lock" glock operation (glop)
      checked the GL_SKIP flag, and if set, would skip further processing.
      
      This patch changes the logic so the go_lock caller, function go_promote,
      checks the GL_SKIP flag before calling the go_lock op in the first place.
      This avoids having to unnecessarily unlock gl_lockref.lock only to
      re-lock it again.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c1442f6b
    • B
      gfs2: Add GL_SKIP holder flag to dump_holder · 4c69038d
      Bob Peterson 提交于
      Somehow, the GL_SKIP flag was missed when dumping glock holders.
      This patch adds it to function hflags2str. I added it at the end because
      I wanted Holder and Skip flags together to read "Hs" rather than "sH"
      to avoid confusion with "Shared" ("SH") holder state.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4c69038d
    • B
      gfs2: remove redundant check in gfs2_rgrp_go_lock · 6edb6ba3
      Bob Peterson 提交于
      Before this patch, function gfs2_rgrp_go_lock checked if GL_SKIP and
      ar_rgrplvb were both true. However, GL_SKIP is only set for rgrps if
      ar_rgrplvb is true (see gfs2_inplace_reserve). This patch simply removes
      the redundant check.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      6edb6ba3
    • A
      gfs2: Fix mmap + page fault deadlocks for direct I/O · b01b2d72
      Andreas Gruenbacher 提交于
      Also disable page faults during direct I/O requests and implement a
      similar kind of retry logic as in the buffered I/O case.
      
      The retry logic in the direct I/O case differs from the buffered I/O
      case in the following way: direct I/O doesn't provide the kinds of
      consistency guarantees between concurrent reads and writes that buffered
      I/O provides, so once we lose the inode glock while faulting in user
      pages, we always resume the operation.  We never need to return a
      partial read or write.
      
      This locking problem was originally reported by Jan Kara.  Linus came up
      with the idea of disabling page faults.  Many thanks to Al Viro and
      Matthew Wilcox for their feedback.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b01b2d72
  9. 24 10月, 2021 2 次提交
    • A
      iomap: Add done_before argument to iomap_dio_rw · 4fdccaa0
      Andreas Gruenbacher 提交于
      Add a done_before argument to iomap_dio_rw that indicates how much of
      the request has already been transferred.  When the request succeeds, we
      report that done_before additional bytes were tranferred.  This is
      useful for finishing a request asynchronously when part of the request
      has already been completed synchronously.
      
      We'll use that to allow iomap_dio_rw to be used with page faults
      disabled: when a page fault occurs while submitting a request, we
      synchronously complete the part of the request that has already been
      submitted.  The caller can then take care of the page fault and call
      iomap_dio_rw again for the rest of the request, passing in the number of
      bytes already tranferred.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      4fdccaa0
    • A
      gfs2: Fix mmap + page fault deadlocks for buffered I/O · 00bfe02f
      Andreas Gruenbacher 提交于
      In the .read_iter and .write_iter file operations, we're accessing
      user-space memory while holding the inode glock.  There is a possibility
      that the memory is mapped to the same file, in which case we'd recurse
      on the same glock.
      
      We could detect and work around this simple case of recursive locking,
      but more complex scenarios exist that involve multiple glocks,
      processes, and cluster nodes, and working around all of those cases
      isn't practical or even possible.
      
      Avoid these kinds of problems by disabling page faults while holding the
      inode glock.  If a page fault would occur, we either end up with a
      partial read or write or with -EFAULT if nothing could be read or
      written.  In either case, we know that we're not done with the
      operation, so we indicate that we're willing to give up the inode glock
      and then we fault in the missing pages.  If that made us lose the inode
      glock, we return a partial read or write.  Otherwise, we resume the
      operation.
      
      This locking problem was originally reported by Jan Kara.  Linus came up
      with the idea of disabling page faults.  Many thanks to Al Viro and
      Matthew Wilcox for their feedback.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      00bfe02f
  10. 21 10月, 2021 1 次提交
    • A
      gfs2: Eliminate ip->i_gh · 1b223f70
      Andreas Gruenbacher 提交于
      Now that gfs2_file_buffered_write is the only remaining user of
      ip->i_gh, we can move the glock holder to the stack (or rather, use the
      one we already have on the stack); there is no need for keeping the
      holder in the inode anymore.
      
      This is slightly complicated by the fact that we're using ip->i_gh for
      the statfs inode in gfs2_file_buffered_write as well.  Writing to the
      statfs inode isn't very common, so allocate the statfs holder
      dynamically when needed.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      1b223f70