提交 · 38a618dbf47f837f11df01052977dcaf31c5c2a8 · openeuler / Kernel

28 6月, 2021 1 次提交

gfs2: Use list_move_tail instead of list_del/list_add_tail · 38a618db

由 Baokun Li 提交于 6月 08, 2021

Using list_move_tail() instead of list_del() + list_add_tail().
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

38a618db

31 5月, 2021 1 次提交

gfs2: Fix use-after-free in gfs2_glock_shrink_scan · 1ab19c5d

由 Hillf Danton 提交于 5月 18, 2021

The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to
remove the glock from the lru list in __gfs2_glock_put().

On the shrink scan path, the same flag is cleared under lru_lock but because
of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the
put side can be made without deleting the glock from the lru list.

Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to
ensure correct behavior on both sides - clear GLF_LRU after list_del under
lru_lock.
Reported-by: Nsyzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com>
Signed-off-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

1ab19c5d

20 5月, 2021 2 次提交

gfs2: fix a deadlock on withdraw-during-mount · 865cc3e9

由 Bob Peterson 提交于 5月 18, 2021

Before this patch, gfs2 would deadlock because of the following
sequence during mount:

mount
   gfs2_fill_super
      gfs2_make_fs_rw <--- Detects IO error with glock
         kthread_stop(sdp->sd_quotad_process);
            <--- Blocked waiting for quotad to finish

logd
   Detects IO error and the need to withdraw
   calls gfs2_withdraw
      gfs2_make_fs_ro
         kthread_stop(sdp->sd_quotad_process);
            <--- Blocked waiting for quotad to finish

gfs2_quotad
   gfs2_statfs_sync
      gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted

glock_work_func
   do_xmote <---Detects IO error, can't release glock: blocked on withdraw
      glops->go_inval
      glock_blocked_by_withdraw
         requeue glock work & exit <--- work requeued, blocked by withdraw

This patch makes a special exception for the statfs system inode glock,
which allows the statfs glock UNLOCK to proceed normally. That allows the
quotad daemon to exit during the withdraw, which allows the logd daemon
to exit during the withdraw, which allows the mount to exit.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

865cc3e9

gfs2: fix scheduling while atomic bug in glocks · 20265d9a

由 Bob Peterson 提交于 5月 18, 2021

Before this patch, in the unlikely event that gfs2_glock_dq encountered
a withdraw, it would do a wait_on_bit to wait for its journal to be
recovered, but it never released the glock's spin_lock, which caused a
scheduling-while-atomic error.

This patch unlocks the lockref spin_lock before waiting for recovery.

Fixes: 601ef0d5 ("gfs2: Force withdraw to replay journals and wait for it to finish")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

20265d9a

06 5月, 2021 1 次提交

mm: introduce and use mapping_empty() · 7716506a

由 Matthew Wilcox (Oracle) 提交于 5月 04, 2021

Patch series "Remove nrexceptional tracking", v2.

We actually use nrexceptional for very little these days. It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger with
the THP patches because we don't know how many indices a shadow entry
occupies. It's easier to just remove it than keep it accurate.

Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects. Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.

This patch (of 4):

Instead of checking the two counters (nrpages and nrexceptional), we can
just check whether i_pages is empty.

Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: NVishal Verma <vishal.l.verma@intel.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7716506a

10 4月, 2021 1 次提交

gfs2: Fix a number of kernel-doc warnings · c551f66c

由 Lee Jones 提交于 3月 30, 2021

Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions.  Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.

Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c551f66c

09 4月, 2021 1 次提交

treewide: Change list_sort to use const pointers · 4f0f586b

由 Sami Tolvanen 提交于 4月 08, 2021

list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.
Suggested-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com

4f0f586b

04 4月, 2021 1 次提交

gfs2: Eliminate gh parameter from go_xmote_bh func · f68effb3

由 Bob Peterson 提交于 3月 19, 2021

The only glock that uses go_xmote_bh glops function is the freeze glock
which uses freeze_go_xmote_bh. It does not use its gh parameter, so
this patch eliminates the unneeded parameter.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f68effb3

18 2月, 2021 1 次提交

gfs2: Allow node-wide exclusive glock sharing · 06e908cd

由 Bob Peterson 提交于 4月 18, 2018

Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
From the point of view of other nodes, the lock is still held
exclusively.

A future patch will start using this flag to improve performance with
rgrp sharing.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

06e908cd

01 12月, 2020 1 次提交

Revert "GFS2: Prevent delete work from occurring on glocks used for create" · a55a47a3

由 Andreas Gruenbacher 提交于 11月 27, 2020

Since commit a0e3cc65 ("gfs2: Turn gl_delete into a delayed work"), we're
cancelling any pending delete work of an iopen glock before attaching a new
inode to that glock in gfs2_create_inode. This means that delete_work_func can
no longer be queued or running when attaching the iopen glock to the new inode,
and we can revert commit a4923865 ("GFS2: Prevent delete work from
occurring on glocks used for create"), which tried to achieve the same but in a
racy way.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a55a47a3

25 11月, 2020 1 次提交

gfs2: set lockdep subclass for iopen glocks · 515b269d

由 Alexander Aring 提交于 11月 23, 2020

This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&gl->gl_lockref.lock);
  lock(&gl->gl_lockref.lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
 #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x19e/0x2e3
 lock_acquire+0x150/0x410
 ? lockref_get+0x9/0x20
 _raw_spin_lock+0x27/0x40
 ? lockref_get+0x9/0x20
 lockref_get+0x9/0x20
 delete_work_func+0x188/0x260
 process_one_work+0x237/0x540
 worker_thread+0x4d/0x3b0
 ? process_one_work+0x540/0x540
 kthread+0x127/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
Suggested-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

515b269d

03 11月, 2020 1 次提交

gfs2: Wake up when sd_glock_disposal becomes zero · da7d554f

由 Alexander Aring 提交于 10月 26, 2020

Commit fc0e38da ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero.  As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up.  Add the missing wakeup.

Fixes: fc0e38da ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

da7d554f

21 10月, 2020 2 次提交

gfs2: Only access gl_delete for iopen glocks · 2ffed529

由 Bob Peterson 提交于 10月 15, 2020

Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete.  Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

2ffed529

gfs2: Fix comments to glock_hash_walk · dbffb29d

由 Bob Peterson 提交于 10月 12, 2020

The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

dbffb29d

15 10月, 2020 3 次提交

gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) · e2c6c8a7

由 Bob Peterson 提交于 10月 12, 2020

Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

e2c6c8a7

gfs2: call truncate_inode_pages_final for address space glocks · ee1e2c77

由 Bob Peterson 提交于 9月 16, 2020

Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

ee1e2c77

gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro · e8a8023e

由 Liu Shixin 提交于 9月 16, 2020

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

e8a8023e

03 8月, 2020 2 次提交

gfs2: Fix refcount leak in gfs2_glock_poke · c07bfb4d

由 Andreas Gruenbacher 提交于 7月 27, 2020

In gfs2_glock_poke, make sure gfs2_holder_uninit is called on the local
glock holder.  Without that, we're leaking a glock and a pid reference.

Fixes: 9e8990de ("gfs2: Smarter iopen glock waiting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c07bfb4d

gfs2: Add some flags missing from glock output · 5deaf1f6

由 Bob Peterson 提交于 6月 09, 2020

Before this patch, three flags were not represented in the glock output.
This patch adds them in:

c - GLF_INODE_CREATING
P - GLF_PENDING_DELETE
x - GLF_FREEING (both f and F are already used)
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

5deaf1f6

30 6月, 2020 1 次提交

gfs2: Don't sleep during glock hash walk · 34244d71

由 Andreas Gruenbacher 提交于 6月 10, 2020

In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

34244d71

06 6月, 2020 8 次提交

gfs2: Smarter iopen glock waiting · 9e8990de

由 Andreas Gruenbacher 提交于 1月 17, 2020

When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.

To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock. The other node(s) trying to acquire the lock can then
abort instead of timing out.

Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

9e8990de

gfs2: Wake up when setting GLF_DEMOTE · 35b6f8fb

由 Andreas Gruenbacher 提交于 1月 17, 2020

Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE
flag.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

35b6f8fb

gfs2: Check inode generation number in delete_work_func · b0dcffd8

由 Andreas Gruenbacher 提交于 1月 15, 2020

In delete_work_func, if the iopen glock still has an inode attached,
limit the inode lookup to that specific generation number: in the likely
case that the inode was deleted on the node on which the inode's link
count dropped to zero, we can skip verifying the on-disk block type and
reading in the inode. The same applies if another node that had the
inode open managed to delete the inode before us.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

b0dcffd8

gfs2: Minor gfs2_lookup_by_inum cleanup · 6bdcadea

由 Andreas Gruenbacher 提交于 1月 15, 2020

Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
generation number will qualify: a valid inode never has a zero no_formal_ino.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

6bdcadea

gfs2: Give up the iopen glock on contention · 8c7b9262

由 Andreas Gruenbacher 提交于 1月 13, 2020

When there's contention on the iopen glock, it means that the link count
of the corresponding inode has dropped to zero on a remote node which is
now trying to delete the inode.  In that case, try to evict the inode so
that the iopen glock will be released, which will allow the remote node
to do its job.

When the inode is still open locally, the inode's reference count won't
drop to zero and so we'll keep holding the inode and its iopen glock.
The remote node will time out its request to grab the iopen glock, and
when the inode is finally closed locally, we'll try to delete it
ourself.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

8c7b9262

gfs2: Turn gl_delete into a delayed work · a0e3cc65

由 Andreas Gruenbacher 提交于 1月 16, 2020

This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a0e3cc65

gfs2: Keep track of deleted inode generations in LVBs · f286d627

由 Andreas Gruenbacher 提交于 1月 13, 2020

When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB). When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes. This avoids taking the resource
group glock in order to verify the block type.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f286d627

gfs2: Allow ASPACE glocks to also have an lvb · 15f2547b

由 Bob Peterson 提交于 1月 15, 2020

Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

15f2547b

05 6月, 2020 2 次提交

gfs2: introduce new gfs2_glock_assert_withdraw · ea4e61c7

由 Bob Peterson 提交于 5月 23, 2020

Before this patch, asserts based on glocks did not print the glock with
the error. This patch introduces a new macro, gfs2_glock_assert_withdraw
which first prints the glock, then takes the assert.

This also changes a few glock asserts to the new macro.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

ea4e61c7

gfs2: print mapping->nrpages in glock dump for address space glocks · 7e901d6e

由 Bob Peterson 提交于 5月 22, 2020

This patch makes the glock dumps in debugfs print the number of pages
(nrpages) for address space glocks. This will aid in debugging.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

7e901d6e

09 5月, 2020 2 次提交

Revert "gfs2: Don't demote a glock until its revokes are written" · b14c9490

由 Bob Peterson 提交于 5月 08, 2020

This reverts commit df5db5f9.

This patch fixes a regression: patch df5db5f9 allowed function
run_queue() to bypass its call to do_xmote() if revokes were queued for
the glock. That's wrong because its call to do_xmote() is what is
responsible for calling the go_sync() glops functions to sync both
the ail list and any revokes queued for it. By bypassing the call,
gfs2 could get into a stand-off where the glock could not be demoted
until its revokes are written back, but the revokes would not be
written back because do_xmote() was never called.

It "sort of" works, however, because there are other mechanisms like
the log flush daemon (logd) that can sync the ail items and revokes,
if it deems it necessary. The problem is: without file system pressure,
it might never deem it necessary.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b14c9490

gfs2: If go_sync returns error, withdraw but skip invalidate · b11e1a84

由 Bob Peterson 提交于 5月 07, 2020

Before this patch, if the go_sync operation returned an error during
the do_xmote process (such as unable to sync metadata to the journal)
the code did goto out. That kept the glock locked, so it could not be
given away, which correctly avoids file system corruption. However,
it never set the withdraw bit or requeueing the glock work. So it would
hang forever, unable to ever demote the glock.

This patch changes to goto to a new label, skip_inval, so that errors
from go_sync are treated the same way as errors from go_inval:
The delayed withdraw bit is set and the work is requeued. That way,
the logd should eventually figure out there's a problem and withdraw
properly there.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

b11e1a84

08 5月, 2020 1 次提交

gfs2: Fix error exit in do_xmote · a8b7528b

由 Bob Peterson 提交于 4月 23, 2020

Before this patch, if an error was detected from glock function go_sync
by function do_xmote, it would return.  But the function had temporarily
unlocked the gl_lockref spin_lock, and it never re-locked it.  When the
caller of do_xmote tried to unlock it again, it was already unlocked,
which resulted in a corrupted spin_lock value.

This patch makes sure the gl_lockref spin_lock is re-locked after it is
unlocked.

Thanks to Wu Bo <wubo40@huawei.com> for reporting this problem.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a8b7528b

28 3月, 2020 1 次提交

gfs2: Switch to list_{first,last}_entry · 969183bc

由 Andreas Gruenbacher 提交于 2月 03, 2020

Replace open-coded versions of list_first_entry and list_last_entry with those
functions.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

969183bc

27 2月, 2020 5 次提交

gfs2: Do proper error checking for go_sync family of glops functions · 1c634f94

由 Bob Peterson 提交于 11月 13, 2019

Before this patch, function do_xmote would try to sync out the glock
dirty data by calling the appropriate glops function XXX_go_sync()
but it did not check for a good return code. If the sync was not
possible due to an io error or whatever, do_xmote would continue on
and call go_inval and release the glock to other cluster nodes.
When those nodes go to replay the journal, they may already be holding
glocks for the journal records that should have been synced, but were
not due to the ignored error.

This patch introduces proper error code checking to the go_sync
family of glops functions.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

1c634f94

gfs2: Don't demote a glock until its revokes are written · df5db5f9

由 Bob Peterson 提交于 11月 13, 2019

Before this patch, run_queue would demote glocks based on whether
there are any more holders. But if the glock has pending revokes that
haven't been written to the media, giving up the glock might end in
file system corruption if the revokes never get written due to
io errors, node crashes and fences, etc. In that case, another node
will replay the metadata blocks associated with the glock, but
because the revoke was never written, it could replay that block
even though the glock had since been granted to another node who
might have made changes.

This patch changes the logic in run_queue so that it never demotes
a glock until its count of pending revokes reaches zero.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

df5db5f9

gfs2: Check for log write errors before telling dlm to unlock · d93ae386

由 Bob Peterson 提交于 4月 29, 2019

Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple checks for errors
in do_xmote after the calls to go_sync and go_inval. If an error
is found, we cannot withdraw yet, because the withdraw itself
uses glocks to make the file system read-only. Instead, we flag
the error. Later, asserts should cause another node to replay
the journal before continuing, thus protecting rgrp and dinode
glocks and maintaining the integrity of the metadata. Note that
we only need to do this for journaled glocks. System glocks
should be able to progress even under withdrawn conditions.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

d93ae386

gfs2: fix infinite loop when checking ail item count before go_inval · 33dbd1e4

由 Bob Peterson 提交于 5月 22, 2019

Before this patch, the rgrp_go_inval and inode_go_inval functions each
checked if there were any items left on the ail count (by way of a
count), and if so, did a withdraw. But the withdraw code now uses
glocks when changing the file system to read-only status. So we can
not have glock functions withdrawing or a hang will likely result:
The glocks can't be serviced by the work_func if the work_func is
busy doing its own withdraw.

This patch removes the checks from the go_inval functions and adds
a centralized check in do_xmote to warn about the problem and not
withdraw, but flag the error so it's eventually caught when the logd
daemon eventually runs.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

33dbd1e4

gfs2: Force withdraw to replay journals and wait for it to finish · 601ef0d5

由 Bob Peterson 提交于 1月 28, 2020

When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held.

The "live" glock which is now used to signal when a withdraw
occurs. When a withdraw occurs, the node signals its withdraw by
dequeueing the "live" glock and trying to enqueue it in EX mode,
thus forcing the other nodes to all see a demote request, by way
of a "1CB" (one callback) try lock. The "live" glock is not
granted in EX; the callback is only just used to indicate a
withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

We also need to take care when glocks decide to withdraw, since
the withdraw code now uses glocks.

Also, before this patch, if a process encountered an error and
decided to withdraw, if another process was already withdrawing,
the second withdraw would be silently ignored, which set it free
to unlock its glocks. That's correct behavior if the original
withdrawer encounters further errors down the road. But if
secondary waiters don't wait for the journal replay, unlocking
glocks will allow other nodes to use them, despite the fact that
the journal containing those blocks is being replayed. The
replay needs to finish before our glocks are released to other
nodes. IOW, secondary withdraws need to wait for the first
withdraw to finish.

For example, if an rgrp glock is unlocked by a process that didn't
wait for the first withdraw, a journal replay could introduce file
system corruption by replaying a rgrp block that has already been
granted to a different cluster node.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

601ef0d5

21 2月, 2020 1 次提交

gfs2: Allow some glocks to be used during withdraw · a72d2401

由 Bob Peterson 提交于 6月 13, 2019

We need to allow some glocks to be enqueued, dequeued, promoted, and demoted
when we're withdrawn. For example, to maintain metadata integrity, we should
disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like
iopen or the transaction glocks may be safely used because none of their
metadata goes through the journal. So in general, we should disallow all
glocks with an address space, and allow all the others. One exception is:
we need to allow our active journal to be demoted so others may recover it.

Allowing glocks after withdraw gives us the ability to take appropriate
action (in a following patch) to have our journal properly replayed by
another node rather than just abandoning the current transactions and
pretending nothing bad happened, leaving the other nodes free to modify
the blocks we had in our journal, which may result in file system
corruption.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

a72d2401

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功