提交 · 0809e1087c3d6f0aeb3246114a06c96bb4102274 · openanolis / cloud-kernel

18 12月, 2019 1 次提交

gfs2: fix glock reference problem in gfs2_trans_remove_revoke · 0809e108

由 Bob Peterson 提交于 11月 14, 2019

[ Upstream commit fe5e7ba11fcf1d75af8173836309e8562aefedef ]

Commit 9287c6452d2b fixed a situation in which gfs2 could use a glock
after it had been freed. To do that, it temporarily added a new glock
reference by calling gfs2_glock_hold in function gfs2_add_revoke.
However, if the bd element was removed by gfs2_trans_remove_revoke, it
failed to drop the additional reference.

This patch adds logic to gfs2_trans_remove_revoke to properly drop the
additional glock reference.

Fixes: 9287c6452d2b ("gfs2: Fix occasional glock use-after-free")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

0809e108

05 12月, 2019 1 次提交

gfs2: take jdata unstuff into account in do_grow · 7baf8fd1

由 Bob Peterson 提交于 12月 18, 2018

[ Upstream commit bc0205612bbd4dd4026d4ba6287f5643c37366ec ]

Before this patch, function do_grow would not reserve enough journal
blocks in the transaction to unstuff jdata files while growing them.
This patch adds the logic to add one more block if the file to grow
is jdata.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7baf8fd1

01 12月, 2019 1 次提交

gfs2: Fix marking bitmaps non-full · fa3fe5f4

由 Andreas Gruenbacher 提交于 9月 27, 2018

[ Upstream commit ec23df2b0cf3e1620f5db77972b7fb735f267eff ]

Reservations in gfs can span multiple gfs2_bitmaps (but they won't span
multiple resource groups).  When removing a reservation, we want to
clear the GBF_FULL flags of all involved gfs2_bitmaps, not just that of
the first bitmap.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fa3fe5f4

24 11月, 2019 2 次提交

GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads · 4d7cf69b

由 Tim Smith 提交于 10月 08, 2018

[ Upstream commit 1eb8d7387908022951792a46fa040ad3942b3b08 ]

Flushing the workqueue can cause operations to happen which might
call gfs2_log_reserve(), or get stuck waiting for locks taken by such
operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
will never wake because the only thing which can wake it is gfs2_logd()
which was already stopped.

This causes umount of a gfs2 filesystem to wedge permanently if, for
example, the umount immediately follows a large delete operation.

When this occured, the following stack trace was obtained from the
umount command

[<ffffffff81087968>] flush_workqueue+0x1c8/0x520
[<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
[<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
[<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
[<ffffffff811b7ff7>] kill_block_super+0x27/0x70
[<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
[<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
[<ffffffff811b79b9>] deactivate_super+0x59/0x60
[<ffffffff811d2998>] cleanup_mnt+0x58/0x80
[<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
[<ffffffff8108c87d>] task_work_run+0x7d/0xa0
[<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
[<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
[<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: NTim Smith <tim.smith@citrix.com>
Signed-off-by: NMark Syms <mark.syms@citrix.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4d7cf69b

gfs2: slow the deluge of io error messages · f3afad5d

由 Bob Peterson 提交于 10月 04, 2018

[ Upstream commit b524abcc01483b2ac093cc6a8a2a7375558d2b64 ]

When an io error is hit, it calls gfs2_io_error_bh_i for every
journal buffer it can't write. Since we changed gfs2_io_error_bh_i
recently to withdraw later in the cycle, it sends a flood of
errors to the console. This patch checks for the file system already
being withdrawn, and if so, doesn't send more messages. It doesn't
stop the flood of messages, but it slows it down and keeps it more
reasonable.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

f3afad5d

21 11月, 2019 1 次提交

gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated · 48b128cd

由 Bob Peterson 提交于 8月 16, 2018

[ Upstream commit 4f36cb36c9d14340bb200d2ad9117b03ce992cfe ]

The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when
a rgrp buffer is valid. It's cleared when the glock is invalidated,
signifying that the buffer data is now invalid. But before this
patch, function update_rgrp_lvb was setting the flag when it
determined it had a valid lvb. But that's an invalid assumption:
just because you have a valid lvb doesn't mean you have valid
buffers. After all, another node may have made the lvb valid,
and this node just fetched it from the glock via dlm.

Consider this scenario:
1. The file system is mounted with RGRPLVB option.
2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks
   to GL_SKIP, it skips the gfs2_rgrp_bh_get.
3. Since loops == 0 and the allocation target (ap->target) is
   bigger than the largest known chunk of blocks in the rgrp
   (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses
   the call to gfs2_rgrp_bh_get there as well.
4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses
   gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due
   to this invalid assumption.
5. The next time update_rgrp_lvb is called, it sees the bit is set
   and just returns 0, assuming both the lvb and rgrp are both
   uptodate. But since this is a smaller allocation, or space has
   been freed by another node, thus adjusting the lvb values,
   it decides to use the rgrp for allocations, with invalid rd_free
   due to the fact it was never updated.

This patch changes update_rgrp_lvb so it doesn't set the UPTODATE
flag anymore. That way, it has no choice but to fetch the latest
values.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

48b128cd

05 10月, 2019 1 次提交

gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps · e0c1e6e5

由 Bob Peterson 提交于 9月 12, 2019

commit f0b444b349e33ae0d3dd93e25ca365482a5d17d4 upstream.

In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.

This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.

Fixes: d552a2b9 ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e0c1e6e5

16 8月, 2019 1 次提交

gfs2: gfs2_walk_metadata fix · 21344f05

由 Andreas Gruenbacher 提交于 8月 05, 2019

commit a27a0c9b6a208722016c8ec5ad31ec96082b91ec upstream.

It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size.  This will confuse fiemap as well as lseek with the
SEEK_DATA flag.

Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.

Fixes xfstest generic/490.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

21344f05

31 5月, 2019 3 次提交

gfs2: Fix occasional glock use-after-free · c4b51dbc

由 Andreas Gruenbacher 提交于 4月 04, 2019

[ Upstream commit 9287c6452d2b1f24ea8e84bd3cf6f3c6f267f712 ]

This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list. Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.

So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke. At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node. This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.

Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list. The
evict decremented the glock reference count to zero, which allowed the
glock to be freed. After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.

This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.
Reported-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c4b51dbc

gfs2: Fix lru_count going negative · bac85208

由 Ross Lagerwall 提交于 3月 27, 2019

[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]

Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:

vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
    negative objects to delete nr=-1

This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
   decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
   lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
   incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
   to the lru list but the lru_count has still decreased by one.

Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

bac85208

gfs2: Fix sign extension bug in gfs2_update_stats · fdc78eed

由 Andreas Gruenbacher 提交于 5月 17, 2019

commit 5a5ec83d6ac974b12085cd99b196795f14079037 upstream.

Commit 4d207133 changed the types of the statistic values in struct
gfs2_lkstats from s64 to u64. Because of that, what should be a signed
value in gfs2_update_stats turned into an unsigned value. When shifted
right, we end up with a large positive value instead of a small negative
value, which results in an incorrect variance estimate.

Fixes: 4d207133 ("gfs2: Make statistics unsigned, suitable for use with do_div()")
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fdc78eed

14 3月, 2019 1 次提交

gfs2: Fix missed wakeups in find_insert_glock · 4f5a4c88

由 Andreas Gruenbacher 提交于 3月 06, 2019

commit 605b0487f0bc1ae9963bf52ece0f5c8055186f81 upstream.

Mark Syms has reported seeing tasks that are stuck waiting in
find_insert_glock.  It turns out that struct lm_lockname contains four padding
bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
hashing the glock name.  As a result, we can end up waking up the wrong
waitqueue, and the waiting tasks may be stuck forever.

Fix that by using ht_parms.key_len instead of sizeof(struct lm_lockname) for
the key length.
Reported-by: NMark Syms <mark.syms@citrix.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4f5a4c88

07 2月, 2019 1 次提交

gfs2: Revert "Fix loop in gfs2_rbm_find" · 8b9be9db

由 Andreas Gruenbacher 提交于 1月 30, 2019

commit e74c98ca2d6ae4376cc15fa2a22483430909d96b upstream.

This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.

It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone.  Let's revert this
for now to have more time for a proper fix.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8b9be9db

13 1月, 2019 2 次提交

gfs2: Fix loop in gfs2_rbm_find · 6ef56c9a

由 Andreas Gruenbacher 提交于 12月 04, 2018

commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.

Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f broke.  The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.

Fixes: e579ed4f ("GFS2: Introduce rbm field bii")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6ef56c9a

gfs2: Get rid of potential double-freeing in gfs2_create_inode · 31048610

由 Andreas Gruenbacher 提交于 11月 26, 2018

commit 6ff9b09e00a441599f3aacdf577254455a048bc9 upstream.

In gfs2_create_inode, after setting and releasing the acl / default_acl, the
acl / default_acl pointers are not set to NULL as they should be.  In that
state, when the function reaches label fail_free_acls, gfs2_create_inode will
try to release the same acls again.

Fix that by setting the pointers to NULL after releasing the acls.  Slightly
simplify the logic.  Also, posix_acl_release checks for NULL already, so
there is no need to duplicate those checks here.

Fixes: e01580bf ("gfs2: use generic posix ACL infrastructure")
Reported-by: NPan Bian <bianpan2016@163.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

31048610

01 12月, 2018 2 次提交

gfs2: Fix iomap buffer head reference counting bug · 38084377

由 Andreas Gruenbacher 提交于 11月 11, 2018

commit c26b5aa8 upstream.

GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
gfs2_iomap_end in iomap->private.  It sets that private pointer in
gfs2_iomap_get.  Users of gfs2_iomap_get other than gfs2_iomap_begin
would have to release iomap->private, but this isn't done correctly,
leading to a leak of buffer head references.

To fix this, move the code for setting iomap->private from
gfs2_iomap_get to gfs2_iomap_begin.

Fixes: 64bc06bb ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

38084377

gfs2: Don't leave s_fs_info pointing to freed memory in init_sbd · 331bd738

由 Andrew Price 提交于 10月 08, 2018

commit 4c62bd9c upstream.

When alloc_percpu() fails, sdp gets freed but sb->s_fs_info still points
to the same address. Move the assignment after that error check so that
s_fs_info can only point to a valid sdp or NULL, which is checked for
later in the error path, in gfs2_kill_super().

Reported-by: syzbot+dcb8b3587445007f5808@syzkaller.appspotmail.com
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

331bd738

21 11月, 2018 2 次提交

gfs2: Fix metadata read-ahead during truncate (2) · 55795dac

由 Andreas Gruenbacher 提交于 11月 08, 2018

commit e7445ced upstream.

The previous attempt to fix for metadata read-ahead during truncate was
incorrect: for files with a height > 2 (1006989312 bytes with a block
size of 4096 bytes), read-ahead requests were not being issued for some
of the indirect blocks discovered while walking the metadata tree,
leading to significant slow-downs when deleting large files.  Fix that.

In addition, only issue read-ahead requests in the first pass through
the meta-data tree, while deallocating data blocks.

Fixes: c3ce5aa9 ("gfs2: Fix metadata read-ahead during truncate")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

55795dac

gfs2: Put bitmap buffers in put_super · 8793f67a

由 Andreas Gruenbacher 提交于 11月 05, 2018

commit 10283ea5 upstream.

gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects
attached to the resource group glocks.  That function should release the
buffers attached to the gfs2_bitmap objects (bi_bh), but the call to
gfs2_rgrp_brelse for doing that is missing.

When gfs2_releasepage later runs across these buffers which are still
referenced, it refuses to free them.  This causes the pages the buffers
are attached to to remain referenced as well.  With enough mount/unmount
cycles, the system will eventually run out of memory.

Fix this by adding the missing call to gfs2_rgrp_brelse in
gfs2_clear_rgrpd.

(Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.)

Fixes: 39b0f1e9 ("GFS2: Don't brelse rgrp buffer_heads every allocation")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8793f67a

14 11月, 2018 1 次提交

gfs2_meta: ->mount() can get NULL dev_name · 8c448126

由 Al Viro 提交于 10月 13, 2018

commit 3df629d873f8683af6f0d34dfc743f637966d483 upstream.

get in sync with mount_bdev() handling of the same

Reported-by: syzbot+c54f8e94e6bba03b04e9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8c448126

12 10月, 2018 1 次提交

gfs2: Fix iomap buffered write support for journaled files (2) · fee5150c

由 Andreas Gruenbacher 提交于 10月 10, 2018

It turns out that the fix in commit 6636c3cc56 is bad; the assertion
that the iomap code no longer creates buffer heads is incorrect for
filesystems that set the IOMAP_F_BUFFER_HEAD flag.

Instead, what's happening is that gfs2_iomap_begin_write treats all
files that have the jdata flag set as journaled files, which is
incorrect as long as those files are inline ("stuffed").  We're handling
stuffed files directly via the page cache, which is why we ended up with
pages without buffer heads in gfs2_page_add_databufs.

Fix this by handling stuffed journaled files correctly in
gfs2_iomap_begin_write.

This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

fee5150c

10 10月, 2018 1 次提交

gfs2: Fix iomap buffered write support for journaled files · dc480feb

由 Andreas Gruenbacher 提交于 10月 09, 2018

Commit 64bc06bb broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done.  However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs.  Fix
that by creating buffer heads ourself when needed.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

dc480feb

08 8月, 2018 1 次提交

gfs2: eliminate update_rgrp_lvb_unlinked · f5580d0f

由 Bob Peterson 提交于 8月 08, 2018

Function update_rgrp_lvb_unlinked used to do the same thing as
be32_add_cpu. This patch removes it in favor of using be32_add_cpu
directly.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndrew Price <anprice@redhat.com>

f5580d0f

07 8月, 2018 1 次提交

gfs2: Fix gfs2_testbit to use clone bitmaps · dffe12a8

由 Bob Peterson 提交于 8月 07, 2018

Function gfs2_testbit is called in three places. Two of those places,
gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
by the block reservations scheme to determine the length of an extent of
free blocks. Before this patch, it wasn't using the clone bitmap, which
means recently-freed blocks were treated as free blocks for the purposes
of an allocation.

This patch adds a new parameter to gfs2_testbit to indicate whether or
not the clone bitmaps should be used (if available).
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

dffe12a8

03 8月, 2018 1 次提交

gfs2: Get rid of gfs2_ea_strlen · 21e2156f

由 Andreas Gruenbacher 提交于 8月 03, 2018

Function gfs2_ea_strlen is only called from ea_list_i, so inline it
there.  Remove the duplicate switch statement and the creative use of
memcpy to set a null byte.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NAndrew Price <anprice@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

21e2156f

27 7月, 2018 1 次提交

gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_out · 3f30f929

由 Bob Peterson 提交于 7月 26, 2018

Before this patch gfs2_rgrp_ondisk2lvb was called after every call
to gfs2_rgrp_out. This patch just calls it directly from within
gfs2_rgrp_out, and moves the function to be before it so we don't
need a function prototype.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

3f30f929

26 7月, 2018 1 次提交

gfs2: Special-case rindex for gfs2_grow · 77612578

由 Andreas Gruenbacher 提交于 7月 25, 2018

To speed up the common case of appending to a file,
gfs2_write_alloc_required presumes that writing beyond the end of a file
will always require additional blocks to be allocated. This assumption
is incorrect for preallocates files, but there are no negative
consequences as long as *some* space is still left on the filesystem.

One special file that always has some space preallocated beyond the end
of the file is the rindex: when growing a filesystem, gfs2_grow adds one
or more new resource groups and appends records describing those
resource groups to the rindex; the preallocated space ensures that this
is always possible.

However, when a filesystem is completely full, gfs2_write_alloc_required
will indicate that an additional allocation is required, and appending
the next record to the rindex will fail even though space for that
record has already been preallocated. To fix that, skip the incorrect
optimization in gfs2_write_alloc_required, but for the rindex only.
Other writes to preallocated space beyond the end of the file are still
allowed to fail on completely full filesystems.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

77612578

25 7月, 2018 9 次提交

GFS2: rgrp free blocks used incorrectly · f6753df3

由 Bob Peterson 提交于 5月 30, 2018

Before this patch, several functions in rgrp.c checked the value of
rgd->rd_free_clone. That does not take into account blocks that were
reserved by a multi-block reservation. This causes a problem when
space gets tight in the file system. For example, when function
gfs2_inplace_reserve checks to see if a rgrp has enough blocks to
satisfy the request, it can accept a rgrp that it should reject
because, although there are enough blocks to satisfy the request
_now_, those blocks may be reserved for another running process.

A second problem with this occurs when we've reserved the remaining
blocks in an rgrp: function rg_mblk_search() can reject an rgrp
improperly because it calculates:

   u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved;

But rd_reserved includes blocks that the current process just
reserved in its own call to inplace_reserve. For example, it can
reserve the last 128 blocks of an rgrp, then reject that same rgrp
because the above calculates out to free_blocks = 0;

Consequences include, but are not limited to, (1) leaving holes,
and thus increasing file system fragmentation, and (2) reporting
file system is full long before it actually is.

This patch introduces a new function, rgd_free, which returns the
number of clone-free blocks (blocks that are truly free as opposed
to blocks that are still being used because an unlinked file is
still open) minus the number of blocks reserved by processes, but
not counting the blocks we ourselves reserved (because obviously
we need to allocate them).
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f6753df3

gfs2: remove redundant variable 'moved' · d1b0cb93

由 Colin Ian King 提交于 7月 17, 2018

Variable 'moved' s being assigned but is never used hence it is
redundant and can be removed.  This has been the case ever since commit
c752666c.

Cleans up clang warning:
warning: variable 'moved' set but not used [-Wunused-but-set-variable]
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

d1b0cb93

gfs2: use iomap_readpage for blocksize == PAGE_SIZE · f95cbb44

由 Andreas Gruenbacher 提交于 6月 06, 2018

We only use iomap_readpage for pages that don't have buffer heads
attached yet: iomap_readpage would otherwise read pages from disk that
are marked buffer_uptodate() but not PageUptodate().  Those pages may
actually contain data more recent than what's on disk.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

f95cbb44

gfs2: Use iomap for stuffed direct I/O reads · 1d45bb7f

由 Andreas Gruenbacher 提交于 6月 27, 2018

Remove the fallback code from direct to buffered I/O for stuffed reads.

For stuffed writes, we must keep the fallback code: the deferred glock
we are holding under direct I/O doesn't allow to write to the inode or
change the file size.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

1d45bb7f

gfs2: fallocate_chunk: Always initialize struct iomap · c2589282

由 Andreas Gruenbacher 提交于 7月 06, 2018

In fallocate_chunk, always initialize the iomap before calling
gfs2_iomap_get_alloc: future changes could otherwise cause things like
iomap.flags to leak across calls.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NBob Peterson <rpeterso@redhat.com>

c2589282

GFS2: Fix recovery issues for spectators · 4a772772

由 Bob Peterson 提交于 7月 05, 2018

This patch fixes a couple problems dealing with spectators who
remain with gfs2 mounts after the last non-spectator node fails.

Before this patch, spectator mounts would try to acquire the dlm's
mounted lock EX as part of its normal recovery sequence.
The mounted lock is only used to determine whether the node is
the first mounter, the first node to mount the file system, for
the purposes of file system recovery and journal replay.

It's not necessary for spectators: they should never do journal
recovery. If they acquire the lock it will prevent another "real"
first-mounter from acquiring the lock in EX mode, which means it
also cannot do journal recovery because it doesn't think it's the
first node to mount the file system.

This patch checks if the mounter is a spectator, and if so, avoids
grabbing the mounted lock. This allows a secondary mounter who is
really the first non-spectator mounter, to do journal recovery:
since the spectator doesn't acquire the lock, it can grab it in
EX mode, and therefore consider itself to be the first mounter
both as a "real" first mount, and as a first-real-after-spectator.

Note that the control lock still needs to be taken in PR mode
in order to fetch the lvb value so it has the current status of
all journal's recovery. This is used as it is today by a first
mounter to replay the journals. For spectators, it's merely
used to fetch the status bits. All recovery is bypassed and the
node waits until recovery is completed by a non-spectator node.

I also improved the cryptic message given by control_mount when
a spectator is waiting for a non-spectator to perform recovery.

It also fixes a problem in gfs2_recover_set whereby spectators
were never queueing recovery work for their own journal.
They cannot do recovery themselves, but they still need to queue
the work so they can check the recovery bits and clear the
DFL_BLOCK_LOCKS bit once the recovery happens on another node.

When the work queue runs on a spectator, it bypasses most of the
work so it won't print a bunch of annoying messages. All it will
print is a bunch of messages that look like this until recovery
completes on the non-spectator node:

GFS2: fsid=mycluster:scratch.s: recover generation 3 jid 0
GFS2: fsid=mycluster:scratch.s: recover jid 0 result busy

These continue every 1.5 seconds until the recovery is done by
the non-spectator, at which time it says:

GFS2: fsid=mycluster:scratch.s: recover generation 4 done

Then it proceeds with its mount.

If the file system is mounted in spectator node and the last
remaining non-spectator is fenced, any IO to the file system is
blocked by dlm and the spectator waits until recovery is
performed by a non-spectator.

If a spectator tries to mount the file system before any
non-spectators, it blocks and repeatedly gives this kernel
message:

GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

4a772772

fs: gfs2: Adding new return type vm_fault_t · 109dbb1e

由 Souptick Joarder 提交于 7月 02, 2018

Use new return type vm_fault_t for gfs2_page_mkwrite
handler.

see commit 1c8f4220 ("mm: change return type to
vm_fault_t") for reference.
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

109dbb1e

gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr · 910f3d58

由 Chengguang Xu 提交于 6月 22, 2018

It seems better to get size by calling posix_acl_xattr_size() instead of
calling posix_acl_to_xattr() with NULL buffer argument.

posix_acl_xattr_size() never returns 0, so remove the unnecessary check.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

910f3d58

gfs2: Don't reject a supposedly full bitmap if we have blocks reserved · e79e0e14

由 Bob Peterson 提交于 6月 18, 2018

Before this patch, you could get into situations like this:

1. Process 1 searches for X free blocks, finds them, makes a reservation
2. Process 2 searches for free blocks in the same rgrp, but now the
   bitmap is full because process 1's reservation is skipped over.
   So it marks the bitmap as GBF_FULL.
3. Process 1 tries to allocate blocks from its own reservation, but
   since the GBF_FULL bit is set, it skips over the rgrp and searches
   elsewhere, thus not using its own reservation.

This patch adds an additional check to allow processes to use their
own reservations.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

e79e0e14

12 7月, 2018 4 次提交

get rid of 'opened' argument of ->atomic_open() - part 3 · 44907d79

由 Al Viro 提交于 6月 08, 2018

now it can be done...
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44907d79

getting rid of 'opened' argument of ->atomic_open() - part 2 · b452a458

由 Al Viro 提交于 6月 08, 2018

__gfs2_lookup(), gfs2_create_inode(), nfs_finish_open() and fuse_create_open()
don't need 'opened' anymore.  Get rid of that argument in those.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b452a458

A
getting rid of 'opened' argument of ->atomic_open() - part 1 · be12af3e
由 Al Viro 提交于 6月 08, 2018
```
'opened' argument of finish_open() is unused.  Kill it.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
```
be12af3e

introduce FMODE_CREATED and switch to it · 73a09dd9

由 Al Viro 提交于 6月 08, 2018

Parallel to FILE_CREATED, goes into ->f_mode instead of *opened.
NFS is a bit of a wart here - it doesn't have file at the point
where FILE_CREATED used to be set, so we need to propagate it
there (for now).  IMA is another one (here and everywhere)...

Note that this needs do_dentry_open() to leave old bits in ->f_mode
alone - we want it to preserve FMODE_CREATED if it had been already
set (no other bit can be there).
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

73a09dd9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功