1. 07 8月, 2020 4 次提交
  2. 06 8月, 2020 1 次提交
  3. 03 8月, 2020 3 次提交
  4. 17 7月, 2020 1 次提交
  5. 08 7月, 2020 1 次提交
    • A
      gfs2: Rework read and page fault locking · 20f82999
      Andreas Gruenbacher 提交于
      So far, gfs2 has taken the inode glocks inside the ->readpage and
      ->readahead address space operations.  Since commit d4388340 ("fs:
      convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed
      the pages to read ahead locked.  With that, the current holder of the
      inode glock may be trying to lock one of those pages while
      gfs2_readahead is trying to take the inode glock, resulting in a
      deadlock.
      
      Fix that by moving the lock taking to the higher-level ->read_iter file
      and ->fault vm operations.  This also gets rid of an ugly lock inversion
      workaround in gfs2_readpage.
      
      The cache consistency model of filesystems like gfs2 is such that if
      data is found in the page cache, the data is up to date and can be used
      without taking any filesystem locks.  If a page is not cached,
      filesystem locks must be taken before populating the page cache.
      
      To avoid taking the inode glock when the data is already cached,
      gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag
      set.  If that fails, the inode glock is taken and the operation is
      retried with the IOCB_NOIO flag cleared.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      20f82999
  6. 03 7月, 2020 5 次提交
    • B
      gfs2: The freeze glock should never be frozen · c860f8ff
      Bob Peterson 提交于
      Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
      (Do not freeze) flag, and some did not. We never want to freeze the freeze
      glock, so this patch makes it consistently use LM_FLAG_NOEXP always.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      c860f8ff
    • B
      gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE · 623ba664
      Bob Peterson 提交于
      Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
      several places. That's wrong because we always want to know the state
      of whether the file system is frozen.
      
      There was also a problem with freeze/thaw transitioning the glock from
      frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
      to processes that request it in SH mode, unless GL_EXACT is specified.
      Therefore, the freeze/thaw code, which tried to reacquire the glock in
      SH mode would get the glock in EX mode, and miss the transition from EX
      to SH. That made it think the thaw had completed normally, but since the
      glock was still cached in EX, other nodes could not freeze again.
      
      This patch removes the GL_NOCACHE flag to allow the freeze glock to be
      cached. It also adds the GL_EXACT flag so the glock is fully transitioned
      from EX to SH, thereby allowing future freeze operations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      623ba664
    • B
      gfs2: read-only mounts should grab the sd_freeze_gl glock · b780cc61
      Bob Peterson 提交于
      Before this patch, only read-write mounts would grab the freeze
      glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze
      glock was never initialized. That meant requests to freeze, which
      request the glock in EX, were granted without any state transition.
      That meant you could mount a gfs2 file system, which is currently
      frozen on a different cluster node, in read-only mode.
      
      This patch makes read-only mounts lock the freeze glock in SH mode,
      which will block for file systems that are frozen on another node.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b780cc61
    • B
      gfs2: freeze should work on read-only mounts · 541656d3
      Bob Peterson 提交于
      Before this patch, function freeze_go_sync, called when promoting
      the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag.
      That's only set for read-write mounts. Read-only mounts don't use a
      journal, so the bit is never set, so the freeze never happened.
      
      This patch removes the check for SDF_JOURNAL_LIVE for freeze requests
      but still checks it when deciding whether to flush a journal.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      541656d3
    • B
      gfs2: eliminate GIF_ORDERED in favor of list_empty · 7542486b
      Bob Peterson 提交于
      In several places, we used the GIF_ORDERED inode flag to determine
      if an inode was on the ordered writes list. However, since we always
      held the sd_ordered_lock spin_lock during the manipulation, we can
      just as easily check list_empty(&ip->i_ordered) instead.
      This allows us to keep more than one ordered writes list to make
      journal writing improvements.
      
      This patch eliminates GIF_ORDERED in favor of checking list_empty.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      7542486b
  7. 30 6月, 2020 3 次提交
  8. 06 6月, 2020 14 次提交
    • B
      gfs2: fix use-after-free on transaction ail lists · 83d060ca
      Bob Peterson 提交于
      Before this patch, transactions could be merged into the system
      transaction by function gfs2_merge_trans(), but the transaction ail
      lists were never merged. Because the ail flushing mechanism can run
      separately, bd elements can be attached to the transaction's buffer
      list during the transaction (trans_add_meta, etc) but quickly moved
      to its ail lists. Later, in function gfs2_trans_end, the transaction
      can be freed (by gfs2_trans_end) while it still has bd elements
      queued to its ail lists, which can cause it to either lose track of
      the bd elements altogether (memory leak) or worse, reference the bd
      elements after the parent transaction has been freed.
      
      Although I've not seen any serious consequences, the problem becomes
      apparent with the previous patch's addition of:
      
      	gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list));
      
      to function gfs2_trans_free().
      
      This patch adds logic into gfs2_merge_trans() to move the merged
      transaction's ail lists to the sdp transaction. This prevents the
      use-after-free. To do this properly, we need to hold the ail lock,
      so we pass sdp into the function instead of the transaction itself.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      83d060ca
    • B
      gfs2: new slab for transactions · b839dada
      Bob Peterson 提交于
      This patch adds a new slab for gfs2 transactions. That allows us to
      reduce kernel memory fragmentation, have better organization of data
      for analysis of vmcore dumps. A new centralized function is added to
      free the slab objects, and it exposes use-after-free by giving
      warnings if a transaction is freed while it still has bd elements
      attached to its buffers or ail lists. We make sure to initialize
      those transaction ail lists so we can check their integrity when freeing.
      
      At a later time, we should add a slab initialization function to
      make it more efficient, but for this initial patch I wanted to
      minimize the impact.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b839dada
    • B
      gfs2: initialize transaction tr_ailX_lists earlier · cbcc89b6
      Bob Peterson 提交于
      Since transactions may be freed shortly after they're created, before
      a log_flush occurs, we need to initialize their ail1 and ail2 lists
      earlier. Before this patch, the ail1 list was initialized in gfs2_log_flush().
      This moves the initialization to the point when the transaction is first
      created.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      cbcc89b6
    • A
      gfs2: Smarter iopen glock waiting · 9e8990de
      Andreas Gruenbacher 提交于
      When trying to upgrade the iopen glock from a shared to an exclusive lock in
      gfs2_evict_inode, abort the wait if there is contention on the corresponding
      inode glock: in that case, the inode must still be in active use on another
      node, and we're not guaranteed to get the iopen glock anytime soon.
      
      To make this work even better, when we notice contention on the iopen glock and
      we can't evict the corresponsing inode and release the iopen glock immediately,
      poke the inode glock.  The other node(s) trying to acquire the lock can then
      abort instead of timing out.
      
      Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
      version of this patch.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      9e8990de
    • A
      gfs2: Wake up when setting GLF_DEMOTE · 35b6f8fb
      Andreas Gruenbacher 提交于
      Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE
      flag.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      35b6f8fb
    • A
      gfs2: Check inode generation number in delete_work_func · b0dcffd8
      Andreas Gruenbacher 提交于
      In delete_work_func, if the iopen glock still has an inode attached,
      limit the inode lookup to that specific generation number: in the likely
      case that the inode was deleted on the node on which the inode's link
      count dropped to zero, we can skip verifying the on-disk block type and
      reading in the inode.  The same applies if another node that had the
      inode open managed to delete the inode before us.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b0dcffd8
    • A
      gfs2: Move inode generation number check into gfs2_inode_lookup · b66648ad
      Andreas Gruenbacher 提交于
      Move the inode generation number check from gfs2_lookup_by_inum into
      gfs2_inode_lookup: gfs2_inode_lookup may be able to decide that an inode with
      the given inode generation number cannot exist without having to verify the
      block type or reading the inode from disk.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b66648ad
    • A
      gfs2: Minor gfs2_lookup_by_inum cleanup · 6bdcadea
      Andreas Gruenbacher 提交于
      Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
      generation number will qualify: a valid inode never has a zero no_formal_ino.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      6bdcadea
    • A
      gfs2: Try harder to delete inodes locally · 9e73330f
      Andreas Gruenbacher 提交于
      When an inode's link count drops to zero and the inode is cached on
      other nodes, the current behavior of gfs2 is to immediately give up and
      to rely on the other node(s) to delete the inode if there is iopen glock
      contention.  This leads to resource group glock bouncing and the loss of
      caching.  With the previous patches in place, we can fix that by not
      giving up immediately.
      
      When the inode is still open on other nodes, those nodes won't be able
      to evict the inode and give up the iopen glock.  In that case, our lock
      conversion request will time out.  The unlink system call will block for
      the duration of the iopen lock conversion request.  We're also holding
      the inode glock in EX mode for an extended duration, so other nodes
      won't be able to make progress on the inode, either.
      
      This is worse than what we had before, but we can prevent other nodes
      from getting stuck by aborting our iopen locking request if there is
      contention on the inode glock.  This will the the subject of a future
      patch.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      9e73330f
    • A
      gfs2: Give up the iopen glock on contention · 8c7b9262
      Andreas Gruenbacher 提交于
      When there's contention on the iopen glock, it means that the link count
      of the corresponding inode has dropped to zero on a remote node which is
      now trying to delete the inode.  In that case, try to evict the inode so
      that the iopen glock will be released, which will allow the remote node
      to do its job.
      
      When the inode is still open locally, the inode's reference count won't
      drop to zero and so we'll keep holding the inode and its iopen glock.
      The remote node will time out its request to grab the iopen glock, and
      when the inode is finally closed locally, we'll try to delete it
      ourself.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      8c7b9262
    • A
      gfs2: Turn gl_delete into a delayed work · a0e3cc65
      Andreas Gruenbacher 提交于
      This requires flushing delayed work items in gfs2_make_fs_ro (which is called
      before unmounting a filesystem).
      
      When inodes are deleted and then recreated, pending gl_delete work items would
      have no effect because the inode generations will have changed, so we can
      cancel any pending gl_delete works before reusing iopen glocks.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a0e3cc65
    • A
      gfs2: Keep track of deleted inode generations in LVBs · f286d627
      Andreas Gruenbacher 提交于
      When deleting an inode, keep track of the generation of the deleted inode in
      the inode glock Lock Value Block (LVB).  When trying to delete an inode
      remotely, check the last-known inode generation against the deleted inode
      generation to skip duplicate remote deletes.  This avoids taking the resource
      group glock in order to verify the block type.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f286d627
    • B
      15f2547b
    • B
      gfs2: instrumentation wrt log_flush stuck · d5dc3d96
      Bob Peterson 提交于
      This adds checks for gfs2_log_flush being stuck, similarly to the check
      in gfs2_ail1_flush. To faciliate this and make the strings easy to grep
      we move the ail1 emptying to its own function, empty_ail1_list.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d5dc3d96
  9. 05 6月, 2020 2 次提交
  10. 04 6月, 2020 1 次提交
  11. 03 6月, 2020 5 次提交