提交 · 34816d20f173a90389c8a7e641166d8ea9dce70a · openeuler / Kernel

01 12月, 2020 1 次提交

gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func · dd0ecf54

由 Andreas Gruenbacher 提交于 11月 30, 2020

In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending
delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work
may block waiting for delete_work_func to complete, and delete_work_func may
block trying to acquire the inode glock in gfs2_inode_lookup.
Reported-by: NAlexander Aring <aahringo@redhat.com>
Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

dd0ecf54

27 11月, 2020 1 次提交

gfs2: Upgrade shared glocks for atime updates · 82e938bd

由 Andreas Gruenbacher 提交于 11月 25, 2020

Commit 20f82999 ("gfs2: Rework read and page fault locking") lifted
the glock lock taking from the low-level ->readpage and ->readahead
address space operations to the higher-level ->read_iter file and
->fault vm operations.  The glocks are still taken in LM_ST_SHARED mode
only.  On filesystems mounted without the noatime option, ->read_iter
sometimes needs to update the atime as well, though.  Right now, this
leads to a failed locking mode assertion in gfs2_dirty_inode.

Fix that by introducing a new update_time inode operation.  There, if
the glock is held non-exclusively, upgrade it to an exclusive lock.
Reported-by: NAlexander Aring <aahringo@redhat.com>
Fixes: 20f82999 ("gfs2: Rework read and page fault locking")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

82e938bd

26 11月, 2020 2 次提交

gfs2: Don't freeze the file system during unmount · f39e7d3a

由 Bob Peterson 提交于 11月 24, 2020

GFS2's freeze/thaw mechanism uses a special freeze glock to control its
operation. It does this with a sync glock operation (glops.c) called
freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the
glops function causes the file system to be frozen. This is intended. However,
GFS2's mount and unmount processes also hold the freeze glock to prevent other
processes, perhaps on different cluster nodes, from mounting the frozen file
system in read-write mode.

Before this patch, there was no check in freeze_go_sync for whether a freeze
in intended or whether the glock demote was caused by a normal unmount.
So it was trying to freeze the file system it's trying to unmount, which
ends up in a deadlock.

This patch adds an additional check to freeze_go_sync so that demotes of the
freeze glock are ignored if they come from the unmount process.

Fixes: 20b32912 ("gfs2: Fix regression in freeze_go_sync")
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f39e7d3a

gfs2: check for empty rgrp tree in gfs2_ri_update · 77872151

由 Bob Peterson 提交于 11月 24, 2020

If gfs2 tries to mount a (corrupt) file system that has no resource
groups it still tries to set preferences on the first one, which causes
a kernel null pointer dereference. This patch adds a check to function
gfs2_ri_update so this condition is detected and reported back as an
error.

Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

77872151

25 11月, 2020 2 次提交

gfs2: set lockdep subclass for iopen glocks · 515b269d

由 Alexander Aring 提交于 11月 23, 2020

This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&gl->gl_lockref.lock);
  lock(&gl->gl_lockref.lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
 #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x19e/0x2e3
 lock_acquire+0x150/0x410
 ? lockref_get+0x9/0x20
 _raw_spin_lock+0x27/0x40
 ? lockref_get+0x9/0x20
 lockref_get+0x9/0x20
 delete_work_func+0x188/0x260
 process_one_work+0x237/0x540
 worker_thread+0x4d/0x3b0
 ? process_one_work+0x540/0x540
 kthread+0x127/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
Suggested-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

515b269d

gfs2: Fix deadlock dumping resource group glocks · 16e6281b

由 Alexander Aring 提交于 11月 22, 2020

Commit 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
introduced additional locking in gfs2_rgrp_go_dump, which is also used for
dumping resource group glocks via debugfs.  However, on that code path, the
glock spin lock is already taken in dump_glock, and taking it again in
gfs2_glock2rgrp leads to deadlock.  This can be reproduced with:

  $ mkfs.gfs2 -O -p lock_nolock /dev/FOO
  $ mount /dev/FOO /mnt/foo
  $ touch /mnt/foo/bar
  $ cat /sys/kernel/debug/gfs2/FOO/glocks

Fix that by not taking the glock spin lock inside the go_dump callback.

Fixes: 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

16e6281b

18 11月, 2020 1 次提交

gfs2: Fix regression in freeze_go_sync · 20b32912

由 Bob Peterson 提交于 11月 18, 2020

Patch 541656d3 ("gfs2: freeze should work on read-only mounts") changed
the check for glock state in function freeze_go_sync() from "gl->gl_state
== LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE".  That's wrong and it
regressed gfs2's freeze/thaw mechanism because it caused only the freezing
node (which requests the glock in EX) to queue freeze work.

All nodes go through this go_sync code path during the freeze to drop their
SHared hold on the freeze glock, allowing the freezing node to acquire it
in EXclusive mode. But all the nodes must freeze access to the file system
locally, so they ALL must queue freeze work. The freeze_work calls
freeze_func, which makes a request to reacquire the freeze glock in SH,
effectively blocking until the thaw from the EX holder. Once thawed, the
freezing node drops its EX hold on the freeze glock, then the (blocked)
freeze_func reacquires the freeze glock in SH again (on all nodes, including
the freezer) so all nodes go back to a thawed state.

This patch changes the check back to gl_state == LM_ST_SHARED like it was
prior to 541656d3.

Fixes: 541656d3 ("gfs2: freeze should work on read-only mounts")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

20b32912

13 11月, 2020 2 次提交

gfs2: Fix case in which ail writes are done to jdata holes · 4e79e3f0

由 Bob Peterson 提交于 11月 12, 2020

Patch b2a846db ("gfs2: Ignore journal log writes for jdata holes")
tried (unsuccessfully) to fix a case in which writes were done to jdata
blocks, the blocks are sent to the ail list, then a punch_hole or truncate
operation caused the blocks to be freed. In other words, the ail items
are for jdata holes. Before b2a846db, the jdata hole caused function
gfs2_block_map to return -EIO, which was eventually interpreted as an
IO error to the journal, and then withdraw.

This patch changes function gfs2_get_block_noalloc, which is only used
for jdata writes, so it returns -ENODATA rather than -EIO, and when
-ENODATA is returned to gfs2_ail1_start_one, the error is ignored.
We can safely ignore it because gfs2_ail1_start_one is only called
when the jdata pages have already been written and truncated, so the
ail1 content no longer applies.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

4e79e3f0

Revert "gfs2: Ignore journal log writes for jdata holes" · d3039c06

由 Bob Peterson 提交于 11月 11, 2020

This reverts commit b2a846db.

That commit changed the behavior of function gfs2_block_map to return
-ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
false.  While that fixed the intended problem for jdata, it also broke
other callers of gfs2_block_map such as some jdata block reads.  Before
the patch, an encountered hole would be skipped and the buffer seen as
unmapped by the caller.  The patch changed the behavior to return
-ENODATA, which is interpreted as an error by the caller.

The -ENODATA return code should be restricted to the specific case where
jdata holes are encountered during ail1 writes.  That will be done in a
later patch.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

d3039c06

12 11月, 2020 1 次提交

gfs2: fix possible reference leak in gfs2_check_blk_type · bc923818

由 Zhang Qilong 提交于 11月 08, 2020

In the fail path of gfs2_check_blk_type, forgetting to call
gfs2_glock_dq_uninit will result in rgd_gh reference leak.
Signed-off-by: NZhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

bc923818

03 11月, 2020 2 次提交

gfs2: Wake up when sd_glock_disposal becomes zero · da7d554f

由 Alexander Aring 提交于 10月 26, 2020

Commit fc0e38da ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero.  As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up.  Add the missing wakeup.

Fixes: fc0e38da ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

da7d554f

gfs2: Don't call cancel_delayed_work_sync from within delete work function · 6bd1c7bd

由 Andreas Gruenbacher 提交于 11月 02, 2020

Right now, we can end up calling cancel_delayed_work_sync from within
delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup ->
gfs2_cancel_delete_work.  When that happens, it will result in a
deadlock.  Instead, gfs2_inode_lookup should skip the call to
gfs2_cancel_delete_work when called from delete_work_func (blktype ==
GFS2_BLKST_UNLINKED).
Reported-by: NAlexander Ahring Oder Aring <aahringo@redhat.com>
Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

6bd1c7bd

30 10月, 2020 6 次提交

gfs2: check for live vs. read-only file system in gfs2_fitrim · c5c68724

由 Bob Peterson 提交于 10月 28, 2020

Before this patch, gfs2_fitrim was not properly checking for a "live" file
system. If the file system had something to trim and the file system
was read-only (or spectator) it would start the trim, but when it starts
the transaction, gfs2_trans_begin returns -EROFS (read-only file system)
and it errors out. However, if the file system was already trimmed so
there's no work to do, it never called gfs2_trans_begin. That code is
bypassed so it never returns the error. Instead, it returns a good
return code with 0 work. All this makes for inconsistent behavior:
The same fstrim command can return -EROFS in one case and 0 in another.
This tripped up xfstests generic/537 which reports the error as:

    +fstrim with unrecovered metadata just ate your filesystem

This patch adds a check for a "live" (iow, active journal, iow, RW)
file system, and if not, returns the error properly.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c5c68724

gfs2: don't initialize statfs_change inodes in spectator mode · 7e5b9266

由 Bob Peterson 提交于 10月 28, 2020

Before commit 97fd734b, the local statfs_changeX inode was never
initialized for spectator mounts. However, it still checks for
spectator mounts when unmounting everything. There's no good reason to
lookup the statfs_changeX files because spectators cannot perform recovery.
It still, however, needs the master statfs file for statfs calls.
This patch adds the check for spectator mounts to init_statfs.

Fixes: 97fd734b ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

7e5b9266

gfs2: Split up gfs2_meta_sync into inode and rgrp versions · 4a55752a

由 Bob Peterson 提交于 10月 27, 2020

Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write
the address space for the metadata being synced. That's great for inodes, but
resource groups all point to the same superblock-address space, sdp->sd_aspace.
Each rgrp has its own range of blocks on which it should operate. That meant
every time an rgrp's metadata was synced, it would write all of them instead
of just the range.

This patch eliminates function gfs2_meta_sync and tailors specific metasync
functions for inodes and rgrps.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

4a55752a

gfs2: init_journal's undo directive should also undo the statfs inodes · c4af59bd

由 Bob Peterson 提交于 10月 27, 2020

Hi,

Before this patch, function init_journal's "undo" directive jumped to label
fail_jinode_gh. But now that it does statfs initialization, it needs to
jump to fail_statfs instead. Failure to do so means that mount failures
after init_journal is successful will neglect to let go of the proper
statfs information, stranding the statfs_changeX inodes. This makes it
impossible to free its glocks, and results in:

gfs2: fsid=sda.s: G: s:EX n:2/805f f:Dqob t:EX d:UN/603701000 a:0 v:0 r:4 m:200 p:1
gfs2: fsid=sda.s: H: s:EX f:H e:0 p:1397947 [(ended)] init_journal+0x548/0x890 [gfs2]
gfs2: fsid=sda.s: I: n:6/32863 t:8 f:0x00 d:0x00000201 s:24 p:0
gfs2: fsid=sda.s: G: s:SH n:5/805f f:Dqob t:SH d:UN/603712000 a:0 v:0 r:3 m:200 p:0
gfs2: fsid=sda.s: H: s:SH f:EH e:0 p:1397947 [(ended)] gfs2_inode_lookup+0x1fb/0x410 [gfs2]
VFS: Busy inodes after unmount of sda. Self-destruct in 5 seconds. Have a nice day...

The next time the file system is mounted, it then reuses the same glocks,
which ends in a kernel NULL pointer dereference when trying to dump the
reused glock.

This patch makes the "undo" function of init_journal jump to fail_statfs
so the statfs files are properly deconstructed upon failure.

Fixes: 97fd734b ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c4af59bd

gfs2: Add missing truncate_inode_pages_final for sd_aspace · a9dd945c

由 Bob Peterson 提交于 10月 27, 2020

Gfs2 creates an address space for its rgrps called sd_aspace, but it never
called truncate_inode_pages_final on it. This confused vfs greatly which
tried to reference the address space after gfs2 had freed the superblock
that contained it.

This patch adds a call to truncate_inode_pages_final for sd_aspace, thus
avoiding the use-after-free.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a9dd945c

gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free · d0f17d38

由 Bob Peterson 提交于 10月 27, 2020

Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling
return_all_reservations, but return_all_reservations still dereferences
rgd->rd_bits in __rs_deltree.  Fix that by moving the call to kfree below the
call to return_all_reservations.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

d0f17d38

23 10月, 2020 2 次提交

gfs2: Recover statfs info in journal head · bedb0f05

由 Abhi Das 提交于 10月 20, 2020

Apply the outstanding statfs changes in the journal head to the
master statfs file. Zero out the local statfs file for good measure.

Previously, statfs updates would be read in from the local statfs inode and
synced to the master statfs inode during recovery.

We now use the statfs updates in the journal head to update the master statfs
inode instead of reading in from the local statfs inode. To preserve backward
compatibility with kernels that can't do this, we still need to keep the
local statfs inode up to date by writing changes to it. At some point in the
future, we can do away with the local statfs inodes altogether and keep the
statfs changes solely in the journal.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

bedb0f05

gfs2: lookup local statfs inodes prior to journal recovery · 97fd734b

由 Abhi Das 提交于 10月 20, 2020

We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

97fd734b

21 10月, 2020 5 次提交

gfs2: Add fields for statfs info in struct gfs2_log_header_host · 73092698

由 Abhi Das 提交于 10月 20, 2020

And read these in __get_log_header() from the log header.
Also make gfs2_statfs_change_out() non-static so it can be used
outside of super.c
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

73092698

gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync · ed3adb37

由 Andreas Gruenbacher 提交于 10月 20, 2020

Once a withdraw has occurred, ignore errors that are the consequence of the
withdraw.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

ed3adb37

gfs2: Eliminate gl_vm · 23cfb0c3

由 Bob Peterson 提交于 10月 15, 2020

The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4
("GFS2: Use range based functions for rgrp sync/invalidation"), which stores
the location of resource groups within their address space.  This structure is
in a union with iopen glock specific fields.  It was introduced because at
unmount time, the resource group objects were destroyed before flushing out any
pending resource group glock work, and flushing out such work could require
flushing / truncating the address space.

Since commit b3422cac ("gfs2: Rework how rgrp buffer_heads are managed"),
any pending resource group glock work is flushed out before destroying the
resource group objects.  So the resource group objects will now always exist in
rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values
where needed instead of caching them.  This also eliminates the union.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

23cfb0c3

gfs2: Only access gl_delete for iopen glocks · 2ffed529

由 Bob Peterson 提交于 10月 15, 2020

Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete.  Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

2ffed529

gfs2: Fix comments to glock_hash_walk · dbffb29d

由 Bob Peterson 提交于 10月 12, 2020

The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

dbffb29d

15 10月, 2020 15 次提交

gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) · e2c6c8a7

由 Bob Peterson 提交于 10月 12, 2020

Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

e2c6c8a7

gfs2: Ignore journal log writes for jdata holes · b2a846db

由 Bob Peterson 提交于 8月 27, 2020

When flushing out its ail1 list, gfs2_write_jdata_page calls function
__block_write_full_page passing in function gfs2_get_block_noalloc.
But there was a problem when a process wrote to a jdata file, then
truncated it or punched a hole, leaving references to the blocks within
the new hole in its ail list, which are to be written to the journal log.

In writing them to the journal, after calling gfs2_block_map, function
gfs2_get_block_noalloc determined that the (hole-punched) block was not
mapped, so it returned -EIO to generic_writepages, which passed it back
to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
there was a real IO error writing to the journal.

This might be a valid error when writing metadata to the journal, but for
journaled data writes, it does not warrant a withdraw.

This patch adds a check to function gfs2_block_map that makes an exception
for journaled data writes that correspond to jdata holes: If the iomap
get function returns a block type of IOMAP_HOLE, it instead returns
-ENODATA which does not cause the withdraw. Other errors are returned as
before.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

b2a846db

gfs2: simplify gfs2_block_map · a6645745

由 Bob Peterson 提交于 8月 19, 2020

Function gfs2_block_map had a lot of redundancy between its create and
no_create paths. This patch simplifies the code to eliminate the redundancy.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a6645745

gfs2: Only set PageChecked if we have a transaction · 6302d6f4

由 Bob Peterson 提交于 8月 19, 2020

With jdata writes, we frequently got into situations where gfs2 deadlocked
because of this calling sequence:

gfs2_ail1_start
   gfs2_ail1_flush - for every tr on the sd_ail1_list:
      gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
         generic_writepages
	    write_cache_pages passing __writepage()
	       calls clear_page_dirty_for_io which calls set_page_dirty:
	          which calls jdata_set_page_dirty which sets PageChecked.
	       __writepage() calls
	          mapping->a_ops->writepage AKA gfs2_jdata_writepage

However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
ignores the write and redirties the page. The problem is that write_cache_pages
calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
in page-writeback.c starting with "Yes, Virginia". If it's jdata,
set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
That causes a conflict because it makes it look like the page has been
redirtied by another writer, in which case we need to skip writing it and
redirty the page. That ends up in a deadlock because it isn't a "real" writer
and nothing will ever clear PageChecked.

If we do have a real writer, it will have started a transaction. So this
patch checks if a transaction is in use, and if not, it skips setting
PageChecked. That way, the page will be dirtied, cleaned, and written
appropriately.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

6302d6f4

gfs2: don't lock sd_ail_lock in gfs2_releasepage · 249ffe18

由 Bob Peterson 提交于 8月 18, 2020

Patch 380f7c65 changed gfs2_releasepage
so that it held the sd_ail_lock spin_lock for most of its processing.
It did this for some mysterious undocumented bug somewhere in the
evict code path. But in the nine years since, evict has been reworked
and fixed many times, and so have the transactions and ail list.
I can't see a reason to hold the sd_ail_lock unless it's protecting
the actual ail lists hung off the transactions. Therefore, this patch
removes the locking to increase speed and efficiency, and to further help
us rework the log flush code to be more concurrent with transactions.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

249ffe18

gfs2: make gfs2_ail1_empty_one return the count of active items · 36c78309

由 Bob Peterson 提交于 8月 18, 2020

This patch is one baby step toward simplifying the journal management.
It simply changes function gfs2_ail1_empty_one from a void to an int and
makes it return a count of active items. This allows the caller to check
the return code rather than list_empty on the tr_ail1_list. This way
we can, in a later patch, combine transaction ail1 and ail2 lists.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

36c78309

gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe · 68942870

由 Bob Peterson 提交于 7月 22, 2020

Before this patch, when blocks were freed, it called gfs2_meta_wipe to
take the metadata out of the pending journal blocks. It did this mostly
by calling another function called gfs2_remove_from_journal. This is
shortsighted because it does not do anything with jdata blocks which
may also be in the journal.

This patch expands the function so that it wipes out jdata blocks from
the journal as well, and it wipes it from the ail1 list if it hasn't
been written back yet. Since it now processes jdata blocks as well,
the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.

New function gfs2_ail1_wipe wants a static view of the ail list, so it
locks the sd_ail_lock when removing items. To accomplish this, function
gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
the caller's responsibility to do so.

I was going to make sd_ail_lock locking conditional, but the practice is
generally frowned upon. For details, see: https://lwn.net/Articles/109066/Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

68942870

gfs2: enhance log_blocks trace point to show log blocks free · 97c5e43d

由 Bob Peterson 提交于 8月 20, 2020

This patch adds some code to enhance the log_blocks trace point. It
reports the number of free log blocks. This makes the trace point much
more useful, especially for debugging performance problems when we can
tell when the journal gets full and needs to wait for flushes, etc.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

97c5e43d

gfs2: add missing log_blocks trace points in gfs2_write_revokes · 77650bdb

由 Bob Peterson 提交于 8月 20, 2020

Function gfs2_write_revokes was incrementing and decrementing the number
of log blocks free, but there was never a log_blocks trace point for it.
Thus, the free blocks from a log_blocks trace would jump around
mysteriously.

This patch adds the missing trace points so the trace makes more sense.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

77650bdb

gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm · 21b6924b

由 Bob Peterson 提交于 7月 22, 2020

Since the function is only used for writing jdata pages, this patch
simply renames function gfs2_write_full_page to a more appropriate
name: gfs2_write_jdata_page. This makes the code easier to understand.

The function was only called in one place, which passed in a pointer to
function gfs2_get_block_noalloc. The function doesn't need to be
passed in. Therefore, this also eliminates the unnecessary parameter
to increase efficiency.

I also took the liberty of cleaning up the function comments.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

21b6924b

gfs2: add validation checks for size of superblock · 0ddc5154

由 Anant Thazhemadam 提交于 10月 14, 2020

In gfs2_check_sb(), no validation checks are performed with regards to
the size of the superblock.
syzkaller detected a slab-out-of-bounds bug that was primarily caused
because the block size for a superblock was set to zero.
A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
Performing validation checks and ensuring that the size of the superblock
is valid fixes this bug.

Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Suggested-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com>
[Minor code reordering.]
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

0ddc5154

gfs2: use-after-free in sysfs deregistration · c2a04b02

由 Jamie Iles 提交于 10月 12, 2020

syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:

  Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228

  CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events kobject_delayed_cleanup
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   print_address_description.constprop.0+0x5c/0x550
   kasan_report+0x13c/0x1c0
   __asan_report_load1_noabort+0x34/0x60
   memcmp+0xd0/0xd8
   gfs2_uevent+0xc4/0x188
   kobject_uevent_env+0x54c/0x1240
   kobject_uevent+0x2c/0x40
   __kobject_del+0x190/0x1d8
   kobject_delayed_cleanup+0x2bc/0x3b8
   process_one_work+0x96c/0x18c0
   worker_thread+0x3f0/0xc30
   kthread+0x390/0x498
   ret_from_fork+0x10/0x18

  Allocated by task 1110:
   kasan_save_stack+0x28/0x58
   __kasan_kmalloc.isra.0+0xc8/0xe8
   kasan_kmalloc+0x10/0x20
   kmem_cache_alloc_trace+0x1d8/0x2f0
   alloc_super+0x64/0x8c0
   sget_fc+0x110/0x620
   get_tree_bdev+0x190/0x648
   gfs2_get_tree+0x50/0x228
   vfs_get_tree+0x84/0x2e8
   path_mount+0x1134/0x1da8
   do_mount+0x124/0x138
   __arm64_sys_mount+0x164/0x238
   el0_svc_common.constprop.0+0x15c/0x598
   do_el0_svc+0x60/0x150
   el0_svc+0x34/0xb0
   el0_sync_handler+0xc8/0x5b4
   el0_sync+0x15c/0x180

  Freed by task 228:
   kasan_save_stack+0x28/0x58
   kasan_set_track+0x28/0x40
   kasan_set_free_info+0x24/0x48
   __kasan_slab_free+0x118/0x190
   kasan_slab_free+0x14/0x20
   slab_free_freelist_hook+0x6c/0x210
   kfree+0x13c/0x460

Use the same pattern as f2fs + ext4 where the kobject destruction must
complete before allowing the FS itself to be freed.  This means that we
need an explicit free_sbd in the callers.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJamie Iles <jamie@nuviainc.com>
[Also go to fail_free when init_names fails.]
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c2a04b02

gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump · 0e539ca1

由 Andrew Price 提交于 10月 07, 2020

When an rindex entry is found to be corrupt, compute_bitstructs() calls
gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:

gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);

gfs2_rgrp_dump then dereferences the gl without checking it and we get

BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280

because there's no rgrp glock involved while reading the rindex on mount.

Fix this by changing gfs2_rgrp_dump to take an rgrp argument.

Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

0e539ca1

gfs2: use iomap for buffered I/O in ordered and writeback mode · 2164f9b9

由 Christoph Hellwig 提交于 7月 01, 2019

Switch to using the iomap readpage and writepage helpers for all I/O in
the ordered and writeback modes, and thus eliminate using buffer_heads
for I/O in these cases.  The journaled data mode is left untouched.

(Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
to set_page_dirty instead of accidentally leaving the page / buffer clean.)
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

2164f9b9

gfs2: call truncate_inode_pages_final for address space glocks · ee1e2c77

由 Bob Peterson 提交于 9月 16, 2020

Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

ee1e2c77

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功