1. 05 12月, 2021 2 次提交
  2. 02 12月, 2021 1 次提交
  3. 08 11月, 2021 1 次提交
  4. 06 11月, 2021 1 次提交
  5. 25 10月, 2021 12 次提交
    • A
      gfs2: check context in gfs2_glock_put · 660a6126
      Alexander Aring 提交于
      Add a might_sleep call into gfs2_glock_put which can sleep in DLM when
      the last reference is released.  This will show problems earlier, and
      not only when the last reference is put.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      660a6126
    • A
      gfs2: Fix glock_hash_walk bugs · 7427f3bb
      Andreas Gruenbacher 提交于
      So far, glock_hash_walk took a reference on each glock it iterated over, and it
      was the examiner's responsibility to drop those references.  Dropping the final
      reference to a glock can sleep and the examiners are called in a RCU critical
      section with spin locks held, so examiners that didn't need the extra reference
      had to drop it asynchronously via gfs2_glock_queue_put or similar.  This wasn't
      done correctly in thaw_glock which did call gfs2_glock_put, and not at all in
      dump_glock_func.
      
      Change glock_hash_walk to not take glock references at all.  That way, the
      examiners that don't need them won't have to bother with slow asynchronous
      puts, and the examiners that do need references can take them themselves.
      Reported-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      7427f3bb
    • A
      gfs2: Cancel remote delete work asynchronously · 486408d6
      Andreas Gruenbacher 提交于
      In gfs2_inode_lookup and gfs2_create_inode, we're calling
      gfs2_cancel_delete_work which currently cancels any remote delete work
      (delete_work_func) synchronously.  This means that if the work is
      currently running, it will wait for it to finish.  We're doing this to
      pevent a previous instance of an inode from having any influence on the
      next instance.
      
      However, delete_work_func uses gfs2_inode_lookup internally, and we can
      end up in a deadlock when delete_work_func gets interrupted at the wrong
      time.  For example,
      
        (1) An inode's iopen glock has delete work queued, but the inode
            itself has been evicted from the inode cache.
      
        (2) The delete work is preempted before reaching gfs2_inode_lookup.
      
        (3) Another process recreates the inode (gfs2_create_inode).  It tries
            to cancel any outstanding delete work, which blocks waiting for
            the ongoing delete work to finish.
      
        (4) The delete work calls gfs2_inode_lookup, which blocks waiting for
            gfs2_create_inode to instantiate and unlock the new inode =>
            deadlock.
      
      It turns out that when the delete work notices that its inode has been
      re-instantiated, it will do nothing.  This means that it's safe to
      cancel the delete work asynchronously.  This prevents the kind of
      deadlock described above.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      486408d6
    • B
      gfs2: fix GL_SKIP node_scope problems · f2e70d8f
      Bob Peterson 提交于
      Before this patch, when a glock was locked, the very first holder on the
      queue would unlock the lockref and call the go_instantiate glops function
      (if one existed), unless GL_SKIP was specified. When we introduced the new
      node-scope concept, we allowed multiple holders to lock glocks in EX mode
      and share the lock.
      
      But node-scope introduced a new problem: if the first holder has GL_SKIP
      and the next one does NOT, since it is not the first holder on the queue,
      the go_instantiate op was not called. Eventually the GL_SKIP holder may
      call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
      still a window of time in which another non-GL_SKIP holder assumes the
      instantiate function had been called by the first holder. In the case of
      rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.
      
      This patch tries to fix the problem by introducing two new glock flags:
      
      GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
      needs to be called to "fill in" or "read in" the object before it is
      referenced.
      
      GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
      in the process of reading in the object. Whenever a function needs to
      reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
      set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
      function.
      
      As before, the gl_lockref spin_lock is unlocked during the IO operation,
      which may take a relatively long amount of time to complete. While
      unlocked, if another process determines go_instantiate is still needed,
      it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
      glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
      it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
      go_instantiate operation may not have been successful.
      
      Functions that previously called the instantiate sub-functions now call
      directly into gfs2_instantiate so the new bits are managed properly.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f2e70d8f
    • B
      gfs2: split glock instantiation off from do_promote · e6f85600
      Bob Peterson 提交于
      Before this patch, function do_promote had a section of code that did
      the actual instantiation.  This patch splits that off into its own
      function, gfs2_instantiate, which prepares us for the next patch that
      will use that function.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e6f85600
    • B
      gfs2: further simplify do_promote · 60d8bae9
      Bob Peterson 提交于
      This patch further simplifies function do_promote by eliminating some
      redundant code in favor of using a lock_released flag. This is just
      prep work for a future patch.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      60d8bae9
    • B
      gfs2: re-factor function do_promote · 17a6ecee
      Bob Peterson 提交于
      This patch simply re-factors function do_promote to reduce the indents.
      The logic should be unchanged. This makes future patches more readable.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      17a6ecee
    • A
      gfs2: Remove 'first' trace_gfs2_promote argument · d74d0ce5
      Andreas Gruenbacher 提交于
      Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the
      'first' holder isn't the one that instantiates the glock
      (gl_instantiate), which is what the 'first' flag was apparently supposed
      to indicate.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d74d0ce5
    • B
      gfs2: change go_lock to go_instantiate · 3278b977
      Bob Peterson 提交于
      Before this patch, the go_lock glock operations (glops) did not do
      any actual locking. They were used to instantiate objects, like reading
      in dinodes and rgrps from the media.
      
      This patch renames the functions to go_instantiate for clarity.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      3278b977
    • A
      gfs2: Save ip from gfs2_glock_nq_init · b016d9a8
      Andreas Gruenbacher 提交于
      Before this patch, when a glock was locked by function gfs2_glock_nq_init,
      it initialized the holder gh_ip (return address) as gfs2_glock_nq_init.
      That made it extremely difficult to track down problems because many
      functions call gfs2_glock_nq_init. This patch changes the function so
      that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes
      it easy to backtrack which holder took the lock.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b016d9a8
    • B
      gfs2: move GL_SKIP check from glops to do_promote · c1442f6b
      Bob Peterson 提交于
      Before this patch, each individual "go_lock" glock operation (glop)
      checked the GL_SKIP flag, and if set, would skip further processing.
      
      This patch changes the logic so the go_lock caller, function go_promote,
      checks the GL_SKIP flag before calling the go_lock op in the first place.
      This avoids having to unnecessarily unlock gl_lockref.lock only to
      re-lock it again.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c1442f6b
    • B
      gfs2: Add GL_SKIP holder flag to dump_holder · 4c69038d
      Bob Peterson 提交于
      Somehow, the GL_SKIP flag was missed when dumping glock holders.
      This patch adds it to function hflags2str. I added it at the end because
      I wanted Holder and Skip flags together to read "Hs" rather than "sH"
      to avoid confusion with "Shared" ("SH") holder state.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4c69038d
  6. 21 10月, 2021 2 次提交
    • B
      gfs2: Introduce flag for glock holder auto-demotion · dc732906
      Bob Peterson 提交于
      This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
      will allow glocks to be demoted automatically on locking conflicts.
      When a locking request comes in that isn't compatible with the locking
      state of an active holder and that holder has the HIF_MAY_DEMOTE flag
      set, the holder will be demoted before the incoming locking request is
      granted.
      
      Note that this mechanism demotes active holders (with the HIF_HOLDER
      flag set), while before we were only demoting glocks without any active
      holders.  This allows processes to keep hold of locks that may form a
      cyclic locking dependency; the core glock logic will then break those
      dependencies in case a conflicting locking request occurs.  We'll use
      this to avoid giving up the inode glock proactively before faulting in
      pages.
      
      Processes that allow a glock holder to be taken away indicate this by
      calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
      Later, they call gfs2_holder_disallow_demote() to clear the flag again,
      and then they check if their holder is still queued: if it is, they are
      still holding the glock; if it isn't, they can re-acquire the glock (or
      abort).
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      dc732906
    • A
      gfs2: Clean up function may_grant · 61444649
      Andreas Gruenbacher 提交于
      Pass the first current glock holder into function may_grant and
      deobfuscate the logic there.
      
      While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant.  To make
      that build cleanly, de-constify the may_grant arguments.
      
      We're now using function find_first_holder in do_promote, so move the
      function's definition above do_promote.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      61444649
  7. 20 8月, 2021 2 次提交
  8. 28 6月, 2021 1 次提交
  9. 31 5月, 2021 1 次提交
  10. 20 5月, 2021 2 次提交
    • B
      gfs2: fix a deadlock on withdraw-during-mount · 865cc3e9
      Bob Peterson 提交于
      Before this patch, gfs2 would deadlock because of the following
      sequence during mount:
      
      mount
         gfs2_fill_super
            gfs2_make_fs_rw <--- Detects IO error with glock
               kthread_stop(sdp->sd_quotad_process);
                  <--- Blocked waiting for quotad to finish
      
      logd
         Detects IO error and the need to withdraw
         calls gfs2_withdraw
            gfs2_make_fs_ro
               kthread_stop(sdp->sd_quotad_process);
                  <--- Blocked waiting for quotad to finish
      
      gfs2_quotad
         gfs2_statfs_sync
            gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted
      
      glock_work_func
         do_xmote <---Detects IO error, can't release glock: blocked on withdraw
            glops->go_inval
            glock_blocked_by_withdraw
               requeue glock work & exit <--- work requeued, blocked by withdraw
      
      This patch makes a special exception for the statfs system inode glock,
      which allows the statfs glock UNLOCK to proceed normally. That allows the
      quotad daemon to exit during the withdraw, which allows the logd daemon
      to exit during the withdraw, which allows the mount to exit.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      865cc3e9
    • B
      gfs2: fix scheduling while atomic bug in glocks · 20265d9a
      Bob Peterson 提交于
      Before this patch, in the unlikely event that gfs2_glock_dq encountered
      a withdraw, it would do a wait_on_bit to wait for its journal to be
      recovered, but it never released the glock's spin_lock, which caused a
      scheduling-while-atomic error.
      
      This patch unlocks the lockref spin_lock before waiting for recovery.
      
      Fixes: 601ef0d5 ("gfs2: Force withdraw to replay journals and wait for it to finish")
      Cc: stable@vger.kernel.org # v5.7+
      Reported-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      20265d9a
  11. 06 5月, 2021 1 次提交
  12. 10 4月, 2021 1 次提交
  13. 09 4月, 2021 1 次提交
  14. 04 4月, 2021 1 次提交
  15. 18 2月, 2021 1 次提交
    • B
      gfs2: Allow node-wide exclusive glock sharing · 06e908cd
      Bob Peterson 提交于
      Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
      glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
      set, the exclusive lock is shared among all local processes who are
      holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
      From the point of view of other nodes, the lock is still held
      exclusively.
      
      A future patch will start using this flag to improve performance with
      rgrp sharing.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      06e908cd
  16. 01 12月, 2020 1 次提交
  17. 25 11月, 2020 1 次提交
    • A
      gfs2: set lockdep subclass for iopen glocks · 515b269d
      Alexander Aring 提交于
      This patch introduce a new globs attribute to define the subclass of the
      glock lockref spinlock. This avoid the following lockdep warning, which
      occurs when we lock an inode lock while an iopen lock is held:
      
      ============================================
      WARNING: possible recursive locking detected
      5.10.0-rc3+ #4990 Not tainted
      --------------------------------------------
      kworker/0:1/12 is trying to acquire lock:
      ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20
      
      but task is already holding lock:
      ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&gl->gl_lockref.lock);
        lock(&gl->gl_lockref.lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by kworker/0:1/12:
       #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
       #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
       #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
      
      stack backtrace:
      CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Workqueue: delete_workqueue delete_work_func
      Call Trace:
       dump_stack+0x8b/0xb0
       __lock_acquire.cold+0x19e/0x2e3
       lock_acquire+0x150/0x410
       ? lockref_get+0x9/0x20
       _raw_spin_lock+0x27/0x40
       ? lockref_get+0x9/0x20
       lockref_get+0x9/0x20
       delete_work_func+0x188/0x260
       process_one_work+0x237/0x540
       worker_thread+0x4d/0x3b0
       ? process_one_work+0x540/0x540
       kthread+0x127/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      Suggested-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      515b269d
  18. 03 11月, 2020 1 次提交
  19. 21 10月, 2020 2 次提交
  20. 15 10月, 2020 3 次提交
  21. 03 8月, 2020 2 次提交