1. 01 12月, 2020 1 次提交
  2. 27 11月, 2020 1 次提交
    • A
      gfs2: Upgrade shared glocks for atime updates · 82e938bd
      Andreas Gruenbacher 提交于
      Commit 20f82999 ("gfs2: Rework read and page fault locking") lifted
      the glock lock taking from the low-level ->readpage and ->readahead
      address space operations to the higher-level ->read_iter file and
      ->fault vm operations.  The glocks are still taken in LM_ST_SHARED mode
      only.  On filesystems mounted without the noatime option, ->read_iter
      sometimes needs to update the atime as well, though.  Right now, this
      leads to a failed locking mode assertion in gfs2_dirty_inode.
      
      Fix that by introducing a new update_time inode operation.  There, if
      the glock is held non-exclusively, upgrade it to an exclusive lock.
      Reported-by: NAlexander Aring <aahringo@redhat.com>
      Fixes: 20f82999 ("gfs2: Rework read and page fault locking")
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      82e938bd
  3. 26 11月, 2020 2 次提交
    • B
      gfs2: Don't freeze the file system during unmount · f39e7d3a
      Bob Peterson 提交于
      GFS2's freeze/thaw mechanism uses a special freeze glock to control its
      operation. It does this with a sync glock operation (glops.c) called
      freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the
      glops function causes the file system to be frozen. This is intended. However,
      GFS2's mount and unmount processes also hold the freeze glock to prevent other
      processes, perhaps on different cluster nodes, from mounting the frozen file
      system in read-write mode.
      
      Before this patch, there was no check in freeze_go_sync for whether a freeze
      in intended or whether the glock demote was caused by a normal unmount.
      So it was trying to freeze the file system it's trying to unmount, which
      ends up in a deadlock.
      
      This patch adds an additional check to freeze_go_sync so that demotes of the
      freeze glock are ignored if they come from the unmount process.
      
      Fixes: 20b32912 ("gfs2: Fix regression in freeze_go_sync")
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      f39e7d3a
    • B
      gfs2: check for empty rgrp tree in gfs2_ri_update · 77872151
      Bob Peterson 提交于
      If gfs2 tries to mount a (corrupt) file system that has no resource
      groups it still tries to set preferences on the first one, which causes
      a kernel null pointer dereference. This patch adds a check to function
      gfs2_ri_update so this condition is detected and reported back as an
      error.
      
      Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      77872151
  4. 25 11月, 2020 2 次提交
    • A
      gfs2: set lockdep subclass for iopen glocks · 515b269d
      Alexander Aring 提交于
      This patch introduce a new globs attribute to define the subclass of the
      glock lockref spinlock. This avoid the following lockdep warning, which
      occurs when we lock an inode lock while an iopen lock is held:
      
      ============================================
      WARNING: possible recursive locking detected
      5.10.0-rc3+ #4990 Not tainted
      --------------------------------------------
      kworker/0:1/12 is trying to acquire lock:
      ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20
      
      but task is already holding lock:
      ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&gl->gl_lockref.lock);
        lock(&gl->gl_lockref.lock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by kworker/0:1/12:
       #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
       #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
       #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260
      
      stack backtrace:
      CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Workqueue: delete_workqueue delete_work_func
      Call Trace:
       dump_stack+0x8b/0xb0
       __lock_acquire.cold+0x19e/0x2e3
       lock_acquire+0x150/0x410
       ? lockref_get+0x9/0x20
       _raw_spin_lock+0x27/0x40
       ? lockref_get+0x9/0x20
       lockref_get+0x9/0x20
       delete_work_func+0x188/0x260
       process_one_work+0x237/0x540
       worker_thread+0x4d/0x3b0
       ? process_one_work+0x540/0x540
       kthread+0x127/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      Suggested-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      515b269d
    • A
      gfs2: Fix deadlock dumping resource group glocks · 16e6281b
      Alexander Aring 提交于
      Commit 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
      introduced additional locking in gfs2_rgrp_go_dump, which is also used for
      dumping resource group glocks via debugfs.  However, on that code path, the
      glock spin lock is already taken in dump_glock, and taking it again in
      gfs2_glock2rgrp leads to deadlock.  This can be reproduced with:
      
        $ mkfs.gfs2 -O -p lock_nolock /dev/FOO
        $ mount /dev/FOO /mnt/foo
        $ touch /mnt/foo/bar
        $ cat /sys/kernel/debug/gfs2/FOO/glocks
      
      Fix that by not taking the glock spin lock inside the go_dump callback.
      
      Fixes: 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      16e6281b
  5. 18 11月, 2020 1 次提交
    • B
      gfs2: Fix regression in freeze_go_sync · 20b32912
      Bob Peterson 提交于
      Patch 541656d3 ("gfs2: freeze should work on read-only mounts") changed
      the check for glock state in function freeze_go_sync() from "gl->gl_state
      == LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE".  That's wrong and it
      regressed gfs2's freeze/thaw mechanism because it caused only the freezing
      node (which requests the glock in EX) to queue freeze work.
      
      All nodes go through this go_sync code path during the freeze to drop their
      SHared hold on the freeze glock, allowing the freezing node to acquire it
      in EXclusive mode. But all the nodes must freeze access to the file system
      locally, so they ALL must queue freeze work. The freeze_work calls
      freeze_func, which makes a request to reacquire the freeze glock in SH,
      effectively blocking until the thaw from the EX holder. Once thawed, the
      freezing node drops its EX hold on the freeze glock, then the (blocked)
      freeze_func reacquires the freeze glock in SH again (on all nodes, including
      the freezer) so all nodes go back to a thawed state.
      
      This patch changes the check back to gl_state == LM_ST_SHARED like it was
      prior to 541656d3.
      
      Fixes: 541656d3 ("gfs2: freeze should work on read-only mounts")
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      20b32912
  6. 13 11月, 2020 2 次提交
    • B
      gfs2: Fix case in which ail writes are done to jdata holes · 4e79e3f0
      Bob Peterson 提交于
      Patch b2a846db ("gfs2: Ignore journal log writes for jdata holes")
      tried (unsuccessfully) to fix a case in which writes were done to jdata
      blocks, the blocks are sent to the ail list, then a punch_hole or truncate
      operation caused the blocks to be freed. In other words, the ail items
      are for jdata holes. Before b2a846db, the jdata hole caused function
      gfs2_block_map to return -EIO, which was eventually interpreted as an
      IO error to the journal, and then withdraw.
      
      This patch changes function gfs2_get_block_noalloc, which is only used
      for jdata writes, so it returns -ENODATA rather than -EIO, and when
      -ENODATA is returned to gfs2_ail1_start_one, the error is ignored.
      We can safely ignore it because gfs2_ail1_start_one is only called
      when the jdata pages have already been written and truncated, so the
      ail1 content no longer applies.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4e79e3f0
    • B
      Revert "gfs2: Ignore journal log writes for jdata holes" · d3039c06
      Bob Peterson 提交于
      This reverts commit b2a846db.
      
      That commit changed the behavior of function gfs2_block_map to return
      -ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
      false.  While that fixed the intended problem for jdata, it also broke
      other callers of gfs2_block_map such as some jdata block reads.  Before
      the patch, an encountered hole would be skipped and the buffer seen as
      unmapped by the caller.  The patch changed the behavior to return
      -ENODATA, which is interpreted as an error by the caller.
      
      The -ENODATA return code should be restricted to the specific case where
      jdata holes are encountered during ail1 writes.  That will be done in a
      later patch.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d3039c06
  7. 12 11月, 2020 1 次提交
  8. 03 11月, 2020 2 次提交
  9. 30 10月, 2020 6 次提交
    • B
      gfs2: check for live vs. read-only file system in gfs2_fitrim · c5c68724
      Bob Peterson 提交于
      Before this patch, gfs2_fitrim was not properly checking for a "live" file
      system. If the file system had something to trim and the file system
      was read-only (or spectator) it would start the trim, but when it starts
      the transaction, gfs2_trans_begin returns -EROFS (read-only file system)
      and it errors out. However, if the file system was already trimmed so
      there's no work to do, it never called gfs2_trans_begin. That code is
      bypassed so it never returns the error. Instead, it returns a good
      return code with 0 work. All this makes for inconsistent behavior:
      The same fstrim command can return -EROFS in one case and 0 in another.
      This tripped up xfstests generic/537 which reports the error as:
      
          +fstrim with unrecovered metadata just ate your filesystem
      
      This patch adds a check for a "live" (iow, active journal, iow, RW)
      file system, and if not, returns the error properly.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c5c68724
    • B
      gfs2: don't initialize statfs_change inodes in spectator mode · 7e5b9266
      Bob Peterson 提交于
      Before commit 97fd734b, the local statfs_changeX inode was never
      initialized for spectator mounts. However, it still checks for
      spectator mounts when unmounting everything. There's no good reason to
      lookup the statfs_changeX files because spectators cannot perform recovery.
      It still, however, needs the master statfs file for statfs calls.
      This patch adds the check for spectator mounts to init_statfs.
      
      Fixes: 97fd734b ("gfs2: lookup local statfs inodes prior to journal recovery")
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      7e5b9266
    • B
      gfs2: Split up gfs2_meta_sync into inode and rgrp versions · 4a55752a
      Bob Peterson 提交于
      Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write
      the address space for the metadata being synced. That's great for inodes, but
      resource groups all point to the same superblock-address space, sdp->sd_aspace.
      Each rgrp has its own range of blocks on which it should operate. That meant
      every time an rgrp's metadata was synced, it would write all of them instead
      of just the range.
      
      This patch eliminates function gfs2_meta_sync and tailors specific metasync
      functions for inodes and rgrps.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4a55752a
    • B
      gfs2: init_journal's undo directive should also undo the statfs inodes · c4af59bd
      Bob Peterson 提交于
      Hi,
      
      Before this patch, function init_journal's "undo" directive jumped to label
      fail_jinode_gh. But now that it does statfs initialization, it needs to
      jump to fail_statfs instead. Failure to do so means that mount failures
      after init_journal is successful will neglect to let go of the proper
      statfs information, stranding the statfs_changeX inodes. This makes it
      impossible to free its glocks, and results in:
      
       gfs2: fsid=sda.s: G:  s:EX n:2/805f f:Dqob t:EX d:UN/603701000 a:0 v:0 r:4 m:200 p:1
       gfs2: fsid=sda.s:  H: s:EX f:H e:0 p:1397947 [(ended)] init_journal+0x548/0x890 [gfs2]
       gfs2: fsid=sda.s:  I: n:6/32863 t:8 f:0x00 d:0x00000201 s:24 p:0
       gfs2: fsid=sda.s: G:  s:SH n:5/805f f:Dqob t:SH d:UN/603712000 a:0 v:0 r:3 m:200 p:0
       gfs2: fsid=sda.s:  H: s:SH f:EH e:0 p:1397947 [(ended)] gfs2_inode_lookup+0x1fb/0x410 [gfs2]
       VFS: Busy inodes after unmount of sda. Self-destruct in 5 seconds.  Have a nice day...
      
      The next time the file system is mounted, it then reuses the same glocks,
      which ends in a kernel NULL pointer dereference when trying to dump the
      reused glock.
      
      This patch makes the "undo" function of init_journal jump to fail_statfs
      so the statfs files are properly deconstructed upon failure.
      
      Fixes: 97fd734b ("gfs2: lookup local statfs inodes prior to journal recovery")
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c4af59bd
    • B
      gfs2: Add missing truncate_inode_pages_final for sd_aspace · a9dd945c
      Bob Peterson 提交于
      Gfs2 creates an address space for its rgrps called sd_aspace, but it never
      called truncate_inode_pages_final on it. This confused vfs greatly which
      tried to reference the address space after gfs2 had freed the superblock
      that contained it.
      
      This patch adds a call to truncate_inode_pages_final for sd_aspace, thus
      avoiding the use-after-free.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a9dd945c
    • B
      gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free · d0f17d38
      Bob Peterson 提交于
      Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling
      return_all_reservations, but return_all_reservations still dereferences
      rgd->rd_bits in __rs_deltree.  Fix that by moving the call to kfree below the
      call to return_all_reservations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      d0f17d38
  10. 23 10月, 2020 2 次提交
    • A
      gfs2: Recover statfs info in journal head · bedb0f05
      Abhi Das 提交于
      Apply the outstanding statfs changes in the journal head to the
      master statfs file. Zero out the local statfs file for good measure.
      
      Previously, statfs updates would be read in from the local statfs inode and
      synced to the master statfs inode during recovery.
      
      We now use the statfs updates in the journal head to update the master statfs
      inode instead of reading in from the local statfs inode. To preserve backward
      compatibility with kernels that can't do this, we still need to keep the
      local statfs inode up to date by writing changes to it. At some point in the
      future, we can do away with the local statfs inodes altogether and keep the
      statfs changes solely in the journal.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      bedb0f05
    • A
      gfs2: lookup local statfs inodes prior to journal recovery · 97fd734b
      Abhi Das 提交于
      We need to lookup the master statfs inode and the local statfs
      inodes earlier in the mount process (in init_journal) so journal
      recovery can use them when it attempts to recover the statfs info.
      We lookup all the local statfs inodes and store them in a linked
      list to allow a node to recover statfs info for other nodes in the
      cluster.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      97fd734b
  11. 21 10月, 2020 5 次提交
  12. 15 10月, 2020 15 次提交
    • B
      gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) · e2c6c8a7
      Bob Peterson 提交于
      Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
      when a glock had a holder queued. It was only checked for inode glocks,
      although set and cleared by all glocks, and it was only used to determine
      whether the glock should be held for the minimum hold time before releasing.
      
      The problem is that the flag is not accurate at all. If a process holds
      the glock, the flag is set. When they dequeue the glock, it only cleared
      the flag in cases when the state actually changed. So if the state doesn't
      change, the flag may still be set, even when nothing is queued.
      
      This happens to iopen glocks often: the get held in SH, then the file is
      closed, but the glock remains in SH mode.
      
      We don't need a special flag to indicate this: we can simply tell whether
      the glock has any items queued to the holders queue. It's a waste of cpu
      time to maintain it.
      
      This patch eliminates the flag in favor of simply checking list_empty
      on the glock holders.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      e2c6c8a7
    • B
      gfs2: Ignore journal log writes for jdata holes · b2a846db
      Bob Peterson 提交于
      When flushing out its ail1 list, gfs2_write_jdata_page calls function
      __block_write_full_page passing in function gfs2_get_block_noalloc.
      But there was a problem when a process wrote to a jdata file, then
      truncated it or punched a hole, leaving references to the blocks within
      the new hole in its ail list, which are to be written to the journal log.
      
      In writing them to the journal, after calling gfs2_block_map, function
      gfs2_get_block_noalloc determined that the (hole-punched) block was not
      mapped, so it returned -EIO to generic_writepages, which passed it back
      to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
      there was a real IO error writing to the journal.
      
      This might be a valid error when writing metadata to the journal, but for
      journaled data writes, it does not warrant a withdraw.
      
      This patch adds a check to function gfs2_block_map that makes an exception
      for journaled data writes that correspond to jdata holes: If the iomap
      get function returns a block type of IOMAP_HOLE, it instead returns
      -ENODATA which does not cause the withdraw. Other errors are returned as
      before.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      b2a846db
    • B
      gfs2: simplify gfs2_block_map · a6645745
      Bob Peterson 提交于
      Function gfs2_block_map had a lot of redundancy between its create and
      no_create paths. This patch simplifies the code to eliminate the redundancy.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      a6645745
    • B
      gfs2: Only set PageChecked if we have a transaction · 6302d6f4
      Bob Peterson 提交于
      With jdata writes, we frequently got into situations where gfs2 deadlocked
      because of this calling sequence:
      
      gfs2_ail1_start
         gfs2_ail1_flush - for every tr on the sd_ail1_list:
            gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
               generic_writepages
      	    write_cache_pages passing __writepage()
      	       calls clear_page_dirty_for_io which calls set_page_dirty:
      	          which calls jdata_set_page_dirty which sets PageChecked.
      	       __writepage() calls
      	          mapping->a_ops->writepage AKA gfs2_jdata_writepage
      
      However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
      ignores the write and redirties the page. The problem is that write_cache_pages
      calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
      in page-writeback.c starting with "Yes, Virginia". If it's jdata,
      set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
      That causes a conflict because it makes it look like the page has been
      redirtied by another writer, in which case we need to skip writing it and
      redirty the page. That ends up in a deadlock because it isn't a "real" writer
      and nothing will ever clear PageChecked.
      
      If we do have a real writer, it will have started a transaction. So this
      patch checks if a transaction is in use, and if not, it skips setting
      PageChecked. That way, the page will be dirtied, cleaned, and written
      appropriately.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      6302d6f4
    • B
      gfs2: don't lock sd_ail_lock in gfs2_releasepage · 249ffe18
      Bob Peterson 提交于
      Patch 380f7c65 changed gfs2_releasepage
      so that it held the sd_ail_lock spin_lock for most of its processing.
      It did this for some mysterious undocumented bug somewhere in the
      evict code path. But in the nine years since, evict has been reworked
      and fixed many times, and so have the transactions and ail list.
      I can't see a reason to hold the sd_ail_lock unless it's protecting
      the actual ail lists hung off the transactions. Therefore, this patch
      removes the locking to increase speed and efficiency, and to further help
      us rework the log flush code to be more concurrent with transactions.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      249ffe18
    • B
      gfs2: make gfs2_ail1_empty_one return the count of active items · 36c78309
      Bob Peterson 提交于
      This patch is one baby step toward simplifying the journal management.
      It simply changes function gfs2_ail1_empty_one from a void to an int and
      makes it return a count of active items. This allows the caller to check
      the return code rather than list_empty on the tr_ail1_list. This way
      we can, in a later patch, combine transaction ail1 and ail2 lists.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      36c78309
    • B
      gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe · 68942870
      Bob Peterson 提交于
      Before this patch, when blocks were freed, it called gfs2_meta_wipe to
      take the metadata out of the pending journal blocks. It did this mostly
      by calling another function called gfs2_remove_from_journal. This is
      shortsighted because it does not do anything with jdata blocks which
      may also be in the journal.
      
      This patch expands the function so that it wipes out jdata blocks from
      the journal as well, and it wipes it from the ail1 list if it hasn't
      been written back yet. Since it now processes jdata blocks as well,
      the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.
      
      New function gfs2_ail1_wipe wants a static view of the ail list, so it
      locks the sd_ail_lock when removing items. To accomplish this, function
      gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
      the caller's responsibility to do so.
      
      I was going to make sd_ail_lock locking conditional, but the practice is
      generally frowned upon. For details, see: https://lwn.net/Articles/109066/Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      68942870
    • B
      gfs2: enhance log_blocks trace point to show log blocks free · 97c5e43d
      Bob Peterson 提交于
      This patch adds some code to enhance the log_blocks trace point. It
      reports the number of free log blocks. This makes the trace point much
      more useful, especially for debugging performance problems when we can
      tell when the journal gets full and needs to wait for flushes, etc.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      97c5e43d
    • B
      gfs2: add missing log_blocks trace points in gfs2_write_revokes · 77650bdb
      Bob Peterson 提交于
      Function gfs2_write_revokes was incrementing and decrementing the number
      of log blocks free, but there was never a log_blocks trace point for it.
      Thus, the free blocks from a log_blocks trace would jump around
      mysteriously.
      
      This patch adds the missing trace points so the trace makes more sense.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      77650bdb
    • B
      gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm · 21b6924b
      Bob Peterson 提交于
      Since the function is only used for writing jdata pages, this patch
      simply renames function gfs2_write_full_page to a more appropriate
      name: gfs2_write_jdata_page. This makes the code easier to understand.
      
      The function was only called in one place, which passed in a pointer to
      function gfs2_get_block_noalloc. The function doesn't need to be
      passed in. Therefore, this also eliminates the unnecessary parameter
      to increase efficiency.
      
      I also took the liberty of cleaning up the function comments.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      21b6924b
    • A
      gfs2: add validation checks for size of superblock · 0ddc5154
      Anant Thazhemadam 提交于
      In gfs2_check_sb(), no validation checks are performed with regards to
      the size of the superblock.
      syzkaller detected a slab-out-of-bounds bug that was primarily caused
      because the block size for a superblock was set to zero.
      A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
      Performing validation checks and ensuring that the size of the superblock
      is valid fixes this bug.
      
      Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
      Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
      Suggested-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com>
      [Minor code reordering.]
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      0ddc5154
    • J
      gfs2: use-after-free in sysfs deregistration · c2a04b02
      Jamie Iles 提交于
      syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:
      
        Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228
      
        CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
        Hardware name: linux,dummy-virt (DT)
        Workqueue: events kobject_delayed_cleanup
        Call trace:
         dump_backtrace+0x0/0x4d8
         show_stack+0x34/0x48
         dump_stack+0x174/0x1f8
         print_address_description.constprop.0+0x5c/0x550
         kasan_report+0x13c/0x1c0
         __asan_report_load1_noabort+0x34/0x60
         memcmp+0xd0/0xd8
         gfs2_uevent+0xc4/0x188
         kobject_uevent_env+0x54c/0x1240
         kobject_uevent+0x2c/0x40
         __kobject_del+0x190/0x1d8
         kobject_delayed_cleanup+0x2bc/0x3b8
         process_one_work+0x96c/0x18c0
         worker_thread+0x3f0/0xc30
         kthread+0x390/0x498
         ret_from_fork+0x10/0x18
      
        Allocated by task 1110:
         kasan_save_stack+0x28/0x58
         __kasan_kmalloc.isra.0+0xc8/0xe8
         kasan_kmalloc+0x10/0x20
         kmem_cache_alloc_trace+0x1d8/0x2f0
         alloc_super+0x64/0x8c0
         sget_fc+0x110/0x620
         get_tree_bdev+0x190/0x648
         gfs2_get_tree+0x50/0x228
         vfs_get_tree+0x84/0x2e8
         path_mount+0x1134/0x1da8
         do_mount+0x124/0x138
         __arm64_sys_mount+0x164/0x238
         el0_svc_common.constprop.0+0x15c/0x598
         do_el0_svc+0x60/0x150
         el0_svc+0x34/0xb0
         el0_sync_handler+0xc8/0x5b4
         el0_sync+0x15c/0x180
      
        Freed by task 228:
         kasan_save_stack+0x28/0x58
         kasan_set_track+0x28/0x40
         kasan_set_free_info+0x24/0x48
         __kasan_slab_free+0x118/0x190
         kasan_slab_free+0x14/0x20
         slab_free_freelist_hook+0x6c/0x210
         kfree+0x13c/0x460
      
      Use the same pattern as f2fs + ext4 where the kobject destruction must
      complete before allowing the FS itself to be freed.  This means that we
      need an explicit free_sbd in the callers.
      
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NJamie Iles <jamie@nuviainc.com>
      [Also go to fail_free when init_names fails.]
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      c2a04b02
    • A
      gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump · 0e539ca1
      Andrew Price 提交于
      When an rindex entry is found to be corrupt, compute_bitstructs() calls
      gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:
      
          gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);
      
      gfs2_rgrp_dump then dereferences the gl without checking it and we get
      
          BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280
      
      because there's no rgrp glock involved while reading the rindex on mount.
      
      Fix this by changing gfs2_rgrp_dump to take an rgrp argument.
      
      Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      0e539ca1
    • C
      gfs2: use iomap for buffered I/O in ordered and writeback mode · 2164f9b9
      Christoph Hellwig 提交于
      Switch to using the iomap readpage and writepage helpers for all I/O in
      the ordered and writeback modes, and thus eliminate using buffer_heads
      for I/O in these cases.  The journaled data mode is left untouched.
      
      (Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
      to set_page_dirty instead of accidentally leaving the page / buffer clean.)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      2164f9b9
    • B
      gfs2: call truncate_inode_pages_final for address space glocks · ee1e2c77
      Bob Peterson 提交于
      Before this patch, we were not calling truncate_inode_pages_final for the
      address space for glocks, which left the possibility of a leak. We now
      take care of the problem instead of complaining, and we do it during
      glock tear-down..
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      ee1e2c77