提交 · a1816ee09a141ccd5227c2225b37a626546ead39 · openeuler / Kernel

14 6月, 2023 1 次提交

gfs2: Don't deref jdesc in evict · a1816ee0

由 Bob Peterson 提交于 6月 14, 2023

mainline inclusion
from mainline-v6.4-rc2
commit 504a10d9
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7CXJL
CVE: CVE-2023-3212

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=504a10d9e46bc37b23d0a1ae2f28973c8516e636

--------------------------------

On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:

init_journal()
...
fail_jindex:
   gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
      if (gfs2_holder_initialized(&ji_gh))
         gfs2_glock_dq_uninit(&ji_gh);
fail:
   iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
      evict()
         gfs2_evict_inode()
            evict_linked_inode()
               ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.

The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.

This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.
Reported-by: NYang Lan <lanyang0908@gmail.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>

Conflicts:
        fs/gfs2/super.c

a1816ee0

18 7月, 2022 1 次提交

gfs2: gfs2_setattr_size error path fix · 63fc4fbc

由 Andreas Gruenbacher 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 0777fe98a44c3434a817817d3fb1f17ae855f3f2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0777fe98a44c3434a817817d3fb1f17ae855f3f2

--------------------------------

[ Upstream commit 7336905a ]

When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get
rid of any reservations the inode may have.  Instead, it should pass in
the inode's write count as the second parameter to allow
gfs2_rs_delete() to figure out if the inode has any writers left.

In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left
where we know that there can be no other users of the inode.  Replace
those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write
count check.

With that, gfs2_rs_delete() is only called with the inode's actual write
count, so get rid of the second parameter.

Fixes: a097dc7e ("GFS2: Make rgrp reservations part of the gfs2_inode structure")
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

63fc4fbc

14 1月, 2022 1 次提交

gfs2: release iopen glock early in evict · 7598c5fb

由 Bob Peterson 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.84
commit 664cceab6f705d2405a546d0c0e24d30565c0b2a
bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=664cceab6f705d2405a546d0c0e24d30565c0b2a

--------------------------------

[ Upstream commit 49462e2b ]

Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock.  In the meantime, another process could reuse
the same block and thus glocks for a new inode.  It would lock the inode
glock (exclusively), and then the iopen glock (shared).  The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.

Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>gl_object
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7598c5fb

19 10月, 2021 1 次提交

gfs2: init system threads before freeze lock · d07b9329

由 Bob Peterson 提交于 10月 19, 2021

stable inclusion
from stable-5.10.65
commit b4bbb77d886b4d3b5915593a9d97a0d3da82ff8d
bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b4bbb77d886b4d3b5915593a9d97a0d3da82ff8d

--------------------------------

[ Upstream commit a28dc123 ]

Patch 96b1454f ("gfs2: move freeze glock outside the make_fs_rw and _ro
functions") changed the gfs2 mount sequence so that it holds the freeze
lock before calling gfs2_make_fs_rw. Before this patch, gfs2_make_fs_rw
called init_threads to initialize the quotad and logd threads. That is a
problem if the system needs to withdraw due to IO errors early in the
mount sequence, for example, while initializing the system statfs inode:

1. An IO error causes the statfs glock to not sync properly after
recovery, and leaves items on the ail list.
2. The leftover items on the ail list causes its do_xmote call to fail,
which makes it want to withdraw. But since the glock code cannot
withdraw (because the withdraw sequence uses glocks) it relies upon
the logd daemon to initiate the withdraw.
3. The withdraw can never be performed by the logd daemon because all
this takes place before the logd daemon is started.

This patch moves function init_threads from super.c to ops_fstype.c
and it changes gfs2_fill_super to start its threads before holding the
freeze lock, and if there's an error, stop its threads after releasing
it. This allows the logd to run unblocked by the freeze lock. Thus,
the logd daemon can perform its withdraw sequence properly.

Fixes: 96b1454f ("gfs2: move freeze glock outside the make_fs_rw and _ro functions")
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d07b9329

26 4月, 2021 2 次提交

gfs2: report "already frozen/thawed" errors · 341c63f6

由 Bob Peterson 提交于 4月 22, 2021

stable inclusion
from stable-5.10.31
commit 6c6d58322079f94d58e22eda1166240181ac3270
bugzilla: 51792

--------------------------------

[ Upstream commit ff132c5f ]

Before this patch, gfs2's freeze function failed to report an error
when the target file system was already frozen as it should (and as
generic vfs function freeze_super does. Similarly, gfs2's thaw function
failed to report an error when trying to thaw a file system that is not
frozen, as vfs function thaw_super does. The errors were checked, but
it always returned a 0 return code.

This patch adds the missing error return codes to gfs2 freeze and thaw.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

341c63f6

gfs2: Flag a withdraw if init_threads() fails · f0a79c55

由 Andrew Price 提交于 4月 22, 2021

stable inclusion
from stable-5.10.31
commit 57fb08fb9a254655e5105fbe4c1a45112c50b4c7
bugzilla: 51792

--------------------------------

[ Upstream commit 62dd0f98 ]

Interrupting mount with ^C quickly enough can cause the kthread_run()
calls in gfs2's init_threads() to fail and the error path leads to a
deadlock on the s_umount rwsem. The abridged chain of events is:

  [mount path]
  get_tree_bdev()
    sget_fc()
      alloc_super()
        down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); [acquired]
    gfs2_fill_super()
      gfs2_make_fs_rw()
        init_threads()
          kthread_run()
            ( Interrupted )
      [Error path]
      gfs2_gl_hash_clear()
        flush_workqueue(glock_workqueue)
          wait_for_completion()

  [workqueue context]
  glock_work_func()
    run_queue()
      do_xmote()
        freeze_go_sync()
          freeze_super()
            down_write(&sb->s_umount) [deadlock]

In freeze_go_sync() there is a gfs2_withdrawn() check that we can use to
make sure freeze_super() is not called in the error path, so add a
gfs2_withdraw_delayed() call when init_threads() fails.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=212231Reported-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f0a79c55

13 4月, 2021 2 次提交

gfs2: move freeze glock outside the make_fs_rw and _ro functions · 393f1371

由 Bob Peterson 提交于 3月 31, 2021

stable inclusion
from stable-5.10.26
commit a602e830ddafd4928bbc98c5b2fb56cfc134741d
bugzilla: 51363

--------------------------------

[ Upstream commit 96b1454f ]

Before this patch, sister functions gfs2_make_fs_rw and gfs2_make_fs_ro locked
(held) the freeze glock by calling gfs2_freeze_lock and gfs2_freeze_unlock.
The problem is, not all the callers of gfs2_make_fs_ro should be doing this.
The three callers of gfs2_make_fs_ro are: remount (gfs2_reconfigure),
signal_our_withdraw, and unmount (gfs2_put_super). But when unmounting the
file system we can get into the following circular lock dependency:

deactivate_super
   down_write(&s->s_umount); <-------------------------------------- s_umount
   deactivate_locked_super
      gfs2_kill_sb
         kill_block_super
            generic_shutdown_super
               gfs2_put_super
                  gfs2_make_fs_ro
                     gfs2_glock_nq_init sd_freeze_gl
                        freeze_go_sync
                           if (freeze glock in SH)
                              freeze_super (vfs)
                                 down_write(&sb->s_umount); <------- s_umount

This patch moves the hold of the freeze glock outside the two sister rw/ro
functions to their callers, but it doesn't request the glock from
gfs2_put_super, thus eliminating the circular dependency.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

393f1371

gfs2: Add common helper for holding and releasing the freeze glock · d48c5f4b

由 Bob Peterson 提交于 3月 31, 2021

stable inclusion
from stable-5.10.26
commit 49787b1bba1ff63c691d25c108a61c5361f60b5f
bugzilla: 51363

--------------------------------

[ Upstream commit c77b52c0 ]

Many places in the gfs2 code queued and dequeued the freeze glock.
Almost all of them acquire it in SHARED mode, and need to specify the
same LM_FLAG_NOEXP and GL_EXACT flags.

This patch adds common helper functions gfs2_freeze_lock and gfs2_freeze_unlock
to make the code more readable, and to prepare for the next patch.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d48c5f4b

30 10月, 2020 1 次提交

gfs2: Add missing truncate_inode_pages_final for sd_aspace · a9dd945c

由 Bob Peterson 提交于 10月 27, 2020

Gfs2 creates an address space for its rgrps called sd_aspace, but it never
called truncate_inode_pages_final on it. This confused vfs greatly which
tried to reference the address space after gfs2 had freed the superblock
that contained it.

This patch adds a call to truncate_inode_pages_final for sd_aspace, thus
avoiding the use-after-free.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a9dd945c

23 10月, 2020 1 次提交

gfs2: lookup local statfs inodes prior to journal recovery · 97fd734b

由 Abhi Das 提交于 10月 20, 2020

We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

97fd734b

21 10月, 2020 1 次提交

gfs2: Add fields for statfs info in struct gfs2_log_header_host · 73092698

由 Abhi Das 提交于 10月 20, 2020

And read these in __get_log_header() from the log header.
Also make gfs2_statfs_change_out() non-static so it can be used
outside of super.c
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

73092698

15 10月, 2020 7 次提交

gfs2: use-after-free in sysfs deregistration · c2a04b02

由 Jamie Iles 提交于 10月 12, 2020

syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:

  Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228

  CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events kobject_delayed_cleanup
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   print_address_description.constprop.0+0x5c/0x550
   kasan_report+0x13c/0x1c0
   __asan_report_load1_noabort+0x34/0x60
   memcmp+0xd0/0xd8
   gfs2_uevent+0xc4/0x188
   kobject_uevent_env+0x54c/0x1240
   kobject_uevent+0x2c/0x40
   __kobject_del+0x190/0x1d8
   kobject_delayed_cleanup+0x2bc/0x3b8
   process_one_work+0x96c/0x18c0
   worker_thread+0x3f0/0xc30
   kthread+0x390/0x498
   ret_from_fork+0x10/0x18

  Allocated by task 1110:
   kasan_save_stack+0x28/0x58
   __kasan_kmalloc.isra.0+0xc8/0xe8
   kasan_kmalloc+0x10/0x20
   kmem_cache_alloc_trace+0x1d8/0x2f0
   alloc_super+0x64/0x8c0
   sget_fc+0x110/0x620
   get_tree_bdev+0x190/0x648
   gfs2_get_tree+0x50/0x228
   vfs_get_tree+0x84/0x2e8
   path_mount+0x1134/0x1da8
   do_mount+0x124/0x138
   __arm64_sys_mount+0x164/0x238
   el0_svc_common.constprop.0+0x15c/0x598
   do_el0_svc+0x60/0x150
   el0_svc+0x34/0xb0
   el0_sync_handler+0xc8/0x5b4
   el0_sync+0x15c/0x180

  Freed by task 228:
   kasan_save_stack+0x28/0x58
   kasan_set_track+0x28/0x40
   kasan_set_free_info+0x24/0x48
   __kasan_slab_free+0x118/0x190
   kasan_slab_free+0x14/0x20
   slab_free_freelist_hook+0x6c/0x210
   kfree+0x13c/0x460

Use the same pattern as f2fs + ext4 where the kobject destruction must
complete before allowing the FS itself to be freed.  This means that we
need an explicit free_sbd in the callers.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJamie Iles <jamie@nuviainc.com>
[Also go to fail_free when init_names fails.]
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c2a04b02

gfs2: simplify the logic in gfs2_evict_inode · 0a0d9f55

由 Bob Peterson 提交于 9月 16, 2020

Now that we've factored out the deleted and undeleted dinode cases
in gfs2_evict_inode, we can greatly simplify the logic. Now the
function is easy to read and understand.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

0a0d9f55

gfs2: factor evict_linked_inode out of gfs2_evict_inode · d90be6ab

由 Bob Peterson 提交于 9月 11, 2020

Now that we've factored out the delete-dinode case to simplify
gfs2_evict_inode, we take it a step further and factor out the other
case: where we don't delete the inode.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

d90be6ab

gfs2: further simplify gfs2_evict_inode with new func evict_should_delete · 53dbc27e

由 Bob Peterson 提交于 9月 11, 2020

This patch further simplifies function gfs2_evict_inode() by adding a
new function evict_should_delete. The function may also lock the inode
glock.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

53dbc27e

gfs2: factor evict_unlinked_inode out of gfs2_evict_inode · 6e7e9a50

由 Bob Peterson 提交于 9月 11, 2020

Function gfs2_evict_inode is way too big, complex and unreadable. This
is a baby step toward breaking it apart to be more readable. It factors
out the portion that deletes the online bits for a dinode that is
unlinked and needs to be deleted. A future patch will factor out more.
(If I factor out too much, the patch itself becomes unreadable).
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

6e7e9a50

gfs2: rename variable error to ret in gfs2_evict_inode · 23d828fc

由 Bob Peterson 提交于 9月 11, 2020

Function gfs2_evict_inode is too big and unreadable. This patch is just
a baby step toward improving that. This first step just renames variable
error to ret. This will help make future patches more readable.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

23d828fc

gfs2: Make sure we don't miss any delayed withdraws · 5a61ae14

由 Andreas Gruenbacher 提交于 8月 28, 2020

Commit ca399c96 changes gfs2_log_flush to not withdraw the
filesystem while holding the log flush lock, but it fails to check if
the filesystem needs to be withdrawn once the log flush lock has been
released. Likewise, commit f05b86db depends on gfs2_log_flush to
trigger for delayed withdraws. Add that and clean up the code flow
somewhat.

In gfs2_put_super, add a check for delayed withdraws that have been
missed to prevent these kinds of bugs in the future.

Fixes: ca399c96 ("gfs2: flesh out delayed withdraw for gfs2_log_flush")
Fixes: f05b86db ("gfs2: Prepare to withdraw as soon as an IO error occurs in log write")
Cc: stable@vger.kernel.org # v5.7+: 462582b9: gfs2: add some much needed cleanup for log flushes that fail
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

5a61ae14

07 8月, 2020 1 次提交

gfs2: When gfs2_dirty_inode gets a glock error, dump the glock · e28c02b9

由 Bob Peterson 提交于 7月 30, 2020

Before this patch, if function gfs2_dirty_inode got an error when
trying to lock the inode glock, it complained, but it didn't say
what glock or inode had the problem.

In this case, it almost always means that dinode_in found an error
with the dinode in the file system. So it makes sense to dump the
glock, which tells us the location of the dinode in the file system.
That will allow us to analyze the corruption from the metadata.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

e28c02b9

03 7月, 2020 2 次提交

gfs2: The freeze glock should never be frozen · c860f8ff

由 Bob Peterson 提交于 6月 25, 2020

Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
(Do not freeze) flag, and some did not. We never want to freeze the freeze
glock, so this patch makes it consistently use LM_FLAG_NOEXP always.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

c860f8ff

gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE · 623ba664

由 Bob Peterson 提交于 6月 25, 2020

Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
several places. That's wrong because we always want to know the state
of whether the file system is frozen.

There was also a problem with freeze/thaw transitioning the glock from
frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
to processes that request it in SH mode, unless GL_EXACT is specified.
Therefore, the freeze/thaw code, which tried to reacquire the glock in
SH mode would get the glock in EX mode, and miss the transition from EX
to SH. That made it think the thaw had completed normally, but since the
glock was still cached in EX, other nodes could not freeze again.

This patch removes the GL_NOCACHE flag to allow the freeze glock to be
cached. It also adds the GL_EXACT flag so the glock is fully transitioned
from EX to SH, thereby allowing future freeze operations.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

623ba664

06 6月, 2020 5 次提交

gfs2: Smarter iopen glock waiting · 9e8990de

由 Andreas Gruenbacher 提交于 1月 17, 2020

When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.

To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock. The other node(s) trying to acquire the lock can then
abort instead of timing out.

Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

9e8990de

gfs2: Try harder to delete inodes locally · 9e73330f

由 Andreas Gruenbacher 提交于 1月 14, 2020

When an inode's link count drops to zero and the inode is cached on
other nodes, the current behavior of gfs2 is to immediately give up and
to rely on the other node(s) to delete the inode if there is iopen glock
contention. This leads to resource group glock bouncing and the loss of
caching. With the previous patches in place, we can fix that by not
giving up immediately.

When the inode is still open on other nodes, those nodes won't be able
to evict the inode and give up the iopen glock. In that case, our lock
conversion request will time out. The unlink system call will block for
the duration of the iopen lock conversion request. We're also holding
the inode glock in EX mode for an extended duration, so other nodes
won't be able to make progress on the inode, either.

This is worse than what we had before, but we can prevent other nodes
from getting stuck by aborting our iopen locking request if there is
contention on the inode glock. This will the the subject of a future
patch.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

9e73330f

gfs2: Give up the iopen glock on contention · 8c7b9262

由 Andreas Gruenbacher 提交于 1月 13, 2020

When there's contention on the iopen glock, it means that the link count
of the corresponding inode has dropped to zero on a remote node which is
now trying to delete the inode.  In that case, try to evict the inode so
that the iopen glock will be released, which will allow the remote node
to do its job.

When the inode is still open locally, the inode's reference count won't
drop to zero and so we'll keep holding the inode and its iopen glock.
The remote node will time out its request to grab the iopen glock, and
when the inode is finally closed locally, we'll try to delete it
ourself.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

8c7b9262

gfs2: Turn gl_delete into a delayed work · a0e3cc65

由 Andreas Gruenbacher 提交于 1月 16, 2020

This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a0e3cc65

gfs2: Keep track of deleted inode generations in LVBs · f286d627

由 Andreas Gruenbacher 提交于 1月 13, 2020

When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB). When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes. This avoids taking the resource
group glock in order to verify the block type.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f286d627

09 5月, 2020 1 次提交

gfs2: Fix problems regarding gfs2_qa_get and _put · 2297ab61

由 Bob Peterson 提交于 5月 04, 2020

This patch fixes a couple of places in which gfs2_qa_get and gfs2_qa_put are
not balanced: we now keep references around whenever a file is open for writing
(see gfs2_open_common and gfs2_release), so we need to put all references we
grab in function gfs2_create_inode.  This was broken in the successful case and
on one error path.

This also means that we don't have a reference to put in gfs2_evict_inode.

In addition, gfs2_qa_put was called for the wrong inode in gfs2_link.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

2297ab61

28 3月, 2020 4 次提交

gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put · 1595548f

由 Andreas Gruenbacher 提交于 3月 06, 2020

Keeping reservations and quotas separate helps reviewing the code.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

1595548f

gfs2: Change inode qa_data to allow multiple users · 2fba46a0

由 Bob Peterson 提交于 2月 27, 2020

Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.

This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

2fba46a0

gfs2: Switch to list_{first,last}_entry · 969183bc

由 Andreas Gruenbacher 提交于 2月 03, 2020

Replace open-coded versions of list_first_entry and list_last_entry with those
functions.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

969183bc

gfs2: Clean up inode initialization and teardown · 40e7e86e

由 Andreas Gruenbacher 提交于 1月 24, 2020

When allocating a new inode, mark the iopen glock holder as uninitialized to
make sure gfs2_evict_inode won't fail after an incomplete create or lookup. In
gfs2_evict_inode, allow the inode glock to be NULL and remove the duplicate
iopen glock teardown code. In gfs2_inode_lookup, don't tear down things that
gfs2_evict_inode will already tear down.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

40e7e86e

27 2月, 2020 1 次提交

gfs2: Force withdraw to replay journals and wait for it to finish · 601ef0d5

由 Bob Peterson 提交于 1月 28, 2020

When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held.

The "live" glock which is now used to signal when a withdraw
occurs. When a withdraw occurs, the node signals its withdraw by
dequeueing the "live" glock and trying to enqueue it in EX mode,
thus forcing the other nodes to all see a demote request, by way
of a "1CB" (one callback) try lock. The "live" glock is not
granted in EX; the callback is only just used to indicate a
withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

We also need to take care when glocks decide to withdraw, since
the withdraw code now uses glocks.

Also, before this patch, if a process encountered an error and
decided to withdraw, if another process was already withdrawing,
the second withdraw would be silently ignored, which set it free
to unlock its glocks. That's correct behavior if the original
withdrawer encounters further errors down the road. But if
secondary waiters don't wait for the journal replay, unlocking
glocks will allow other nodes to use them, despite the fact that
the journal containing those blocks is being replayed. The
replay needs to finish before our glocks are released to other
nodes. IOW, secondary withdraws need to wait for the first
withdraw to finish.

For example, if an rgrp glock is unlocked by a process that didn't
wait for the first withdraw, a journal replay could introduce file
system corruption by replaying a rgrp block that has already been
granted to a different cluster node.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

601ef0d5

16 11月, 2019 1 次提交

gfs2: Abort gfs2_freeze if io error is seen · 52b1cdcb

由 Bob Peterson 提交于 11月 15, 2019

Before this patch, an io error, such as -EIO writing to the journal
would cause function gfs2_freeze to go into an infinite loop,
continuously retrying the freeze operation. But nothing ever clears
the -EIO except unmount after withdraw, which is impossible if the
freeze operation never ends (fails). Instead you get:

[ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6499.773058] gfs2: fsid=dm-32.0: retrying...
[ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6500.797015] gfs2: fsid=dm-32.0: retrying...

This patch adds a check for -EIO in gfs2_freeze, and if seen, it
dequeues the freeze glock, aborts the loop and returns the error.
Also, there's no need to pass the freeze holder to function
gfs2_lock_fs_check_clean since it's only called in one place and
it's a well-known superblock pointer, so this simplifies that.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

52b1cdcb

15 11月, 2019 2 次提交

gfs2: Don't loop forever in gfs2_freeze if withdrawn · 60528afa

由 Bob Peterson 提交于 11月 14, 2019

Before this patch, function gfs2_freeze would loop forever if the
filesystem it tries to freeze is withdrawn. That's because function
gfs2_lock_fs_check_clean tries to enqueue the glock of the journal and
the gfs2_glock returns -EIO because you can't enqueue a journaled glock
after a withdraw.

Move the check for file system withdraw inside the loop so that the loop
can end when withdraw occurs.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

60528afa

gfs2: Introduce function gfs2_withdrawn · eb43e660

由 Bob Peterson 提交于 11月 14, 2019

Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN
bit to call it. This does not change the logic or function of gfs2, and
it facilitates later improvements to the withdraw sequence.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

eb43e660

19 9月, 2019 1 次提交

gfs2: Convert gfs2 to fs_context · 1f52aa08

由 Andrew Price 提交于 3月 27, 2019

Convert gfs2 and gfs2meta to fs_context. Removes the duplicated vfs code
from gfs2_mount and instead uses the new vfs_get_block_super() before
switching the ->root to the appropriate dentry.

The mount option parsing has been converted to the new API and error
reporting for invalid options has been made more precise at the same
time.

All of the mount/remount code has been moved into ops_fstype.c
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: cluster-devel@redhat.com
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1f52aa08

10 8月, 2019 1 次提交

gfs2: Minor gfs2_alloc_inode cleanup · d4031259

由 Andreas Gruenbacher 提交于 7月 24, 2019

In gfs2_alloc_inode, when kmem_cache_alloc cannot allocate a new object, return
NULL immediately. The code currently relies on the fact that i_inode is the
first member in struct gfs2_inode and so ip and &ip->i_inode evaluate to the
same address, but that isn't immediately obvious.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

d4031259

28 6月, 2019 3 次提交

gfs2: replace more printk with calls to fs_info and friends · f29e62ee

由 Bob Peterson 提交于 5月 13, 2019

This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f29e62ee

gfs2: simplify gfs2_freeze by removing case · 55317f5b

由 Bob Peterson 提交于 4月 29, 2019

Function gfs2_freeze had a case statement that simply checked the
error code, but the break statements just made the logic hard to
read. This patch simplifies the logic in favor of a simple if.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

55317f5b

gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN · 04aea0ca

由 Bob Peterson 提交于 5月 07, 2019

Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

04aea0ca

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功