提交 · a72d2401f54b7db41c77ab971238a06eafe929fb · openeuler / Kernel

21 2月, 2020 1 次提交

gfs2: Allow some glocks to be used during withdraw · a72d2401

由 Bob Peterson 提交于 6月 13, 2019

We need to allow some glocks to be enqueued, dequeued, promoted, and demoted
when we're withdrawn. For example, to maintain metadata integrity, we should
disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like
iopen or the transaction glocks may be safely used because none of their
metadata goes through the journal. So in general, we should disallow all
glocks with an address space, and allow all the others. One exception is:
we need to allow our active journal to be demoted so others may recover it.

Allowing glocks after withdraw gives us the ability to take appropriate
action (in a following patch) to have our journal properly replayed by
another node rather than just abandoning the current transactions and
pretending nothing bad happened, leaving the other nodes free to modify
the blocks we had in our journal, which may result in file system
corruption.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

a72d2401

10 2月, 2020 3 次提交

gfs2: log error reform · 036330c9

由 Bob Peterson 提交于 4月 10, 2019

Before this patch, gfs2 kept track of journal io errors in two
places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
This patch consolidates the two into sd_log_error so that it
reflects the first error encountered writing to the journal.
In future patches, we will take advantage of this by checking
this value rather than having to check both when reacting to
io errors.

In addition, this fixes a tight loop in unmount: If buffers
get on the ail1 list and an io error occurs elsewhere, the
ail1 list would never be cleared because they were always busy.
So unmount would hang, waiting for the ail1 list to empty.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

036330c9

gfs2: Rework how rgrp buffer_heads are managed · b3422cac

由 Bob Peterson 提交于 11月 13, 2019

Before this patch, the rgrp code had a serious problem related to
how it managed buffer_heads for resource groups. The problem caused
file system corruption, especially in cases of journal replay.

When an rgrp glock was demoted to transfer ownership to a
different cluster node, do_xmote() first calls rgrp_go_sync and then
rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called
gfs2_rgrp_brelse() that dropped the buffer_head reference count.
In most cases, the reference count went to zero, which is right.
However, there were other places where the buffers are handled
differently.

After rgrp_go_sync, do_xmote called rgrp_go_inval which called
gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to
truncate_inode_pages_range would get rid of the pages in memory,
but only if the reference count drops to 0.

Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL.
So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer
to the buffer_heads in cases where the reference count was still 1.
Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time,
it failed the check for "if (bi->bi_bh)" and thus failed to call
brelse a second time. Because of that, the reference count on those
buffers sometimes failed to drop from 1 to 0. And that caused
function truncate_inode_pages_range to keep the pages in page cache
rather than freeing them.

The next time the rgrp glock was acquired, the metadata read of
the rgrp buffers re-used the pages in memory, which were now
wrong because they were likely modified by the other node who
acquired the glock in EX (which is why we demoted the glock).
This re-use of the page cache caused corruption because changes
made by the other nodes were never seen, so the bitmaps were
inaccurate.

For some reason, the problem became most apparent when journal
replay forced the replay of rgrps in memory, which caused newer
rgrp data to be overwritten by the older in-core pages.

A big part of the problem was that the rgrp buffer were released
in multiple places: The go_unlock function would release them when
the glock was released rather than when the glock is demoted,
which is clearly wrong because our intent was to cache them until
the glock is demoted from SH or EX.

This patch attempts to clean up the mess and make one consistent
and centralized mechanism for managing the rgrp buffer_heads by
implementing several changes:

1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync.
   We don't want to release the buffers or zero the pointers when
   syncing for the reasons stated above. It only makes sense to
   release them when the glock is actually invalidated (go_inval).
   And when we do, then we set the bh pointers to NULL.
2. The go_unlock function (which was only used for rgrps) is
   eliminated, as we've talked about doing many times before.
   The go_unlock function was called too early in the glock dq
   process, and should not happen until the glock is invalidated.
3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd.
   That will now happen automatically when the rgrp glocks are
   demoted, and shouldn't happen any sooner or later than that.
   Instead, function gfs2_clear_rgrpd has been modified to demote
   the rgrp glocks, and therefore, free those pages, before the
   remaining glocks are culled by gfs2_gl_hash_clear. This
   prevents the gl_object from hanging around when the glocks are
   culled.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

b3422cac

gfs2: Introduce concept of a pending withdraw · 69511080

由 Bob Peterson 提交于 2月 12, 2019

File system withdraws can be delayed when inconsistencies are
discovered when we cannot withdraw immediately, for example, when
critical spin_locks are held. But delaying the withdraw can cause
gfs2 to ignore the error and keep running for a short period of time.
For example, an rgrp glock may be dequeued and demoted while there
are still buffers that haven't been properly revoked, due to io
errors writing to the journal.

This patch introduces a new concept of a pending withdraw, which
means an inconsistency has been discovered and we need to withdraw
at the earliest possible opportunity. In these cases, we aren't
quite withdrawn yet, but we still need to not dequeue glocks and
other critical things. If we dequeue the glocks and the withdraw
results in our journal being replayed, the replay could overwrite
data that's been modified by a different node that acquired the
glock in the meantime.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

69511080

28 1月, 2020 1 次提交

Revert "gfs2: eliminate tr_num_revoke_rm" · a31b4ec5

由 Bob Peterson 提交于 1月 20, 2020

This reverts commit e955537e.

Before patch e955537e, tr_num_revoke tracked the number of revokes
added to the transaction, and tr_num_revoke_rm tracked how many
revokes were removed. But since revokes are queued off the sdp
(superblock) pointer, some transactions could remove more revokes
than they added. (e.g. revokes added by a different process).
Commit e955537e eliminated transaction variable tr_num_revoke_rm,
but in order to do so, it changed the accounting to always use
tr_num_revoke for its math. Since you can remove more revokes than
you add, tr_num_revoke could now become a negative value.
This negative value broke the assert in function gfs2_trans_end:

	if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) &&
			       (tr->tr_num_revoke <=3D tr->tr_revokes)))

One way to fix this is to simply remove the tr_num_revoke clause
from the assert and allow the value to become negative. Andreas
didn't like that idea, so instead, we decided to revert e955537e.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a31b4ec5

20 1月, 2020 2 次提交

gfs2: Remove GFS2_MIN_LVB_SIZE define · f7be987b

由 Andreas Gruenbacher 提交于 1月 16, 2020

The dlm lockspace is set up to have lock value blocks of GDLM_LVB_SIZE bytes,
and dlm is the only lock manager we support, so there is no point in claiming
that the lock value block could have any other size.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f7be987b

gfs2: Fix incorrect variable name · 5d439758

由 Andreas Gruenbacher 提交于 1月 09, 2020

Rename sd_log_commited_revoke to sd_log_committed_revoke.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

5d439758

08 1月, 2020 1 次提交

gfs2: eliminate ssize parameter from gfs2_struct2blk · 2e9eeaa1

由 Bob Peterson 提交于 12月 13, 2019

Every caller of function gfs2_struct2blk specified sizeof(u64).

This patch eliminates the unnecessary parameter and replaces the
size calculation with a new superblock variable that is computed
to be the maximum number of block pointers we can fit inside a
log descriptor, as is done for pointers per dinode and indirect
block.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

2e9eeaa1

19 9月, 2019 1 次提交

gfs2: Convert gfs2 to fs_context · 1f52aa08

由 Andrew Price 提交于 3月 27, 2019

Convert gfs2 and gfs2meta to fs_context. Removes the duplicated vfs code
from gfs2_mount and instead uses the new vfs_get_block_super() before
switching the ->root to the appropriate dentry.

The mount option parsing has been converted to the new API and error
reporting for invalid options has been made more precise at the same
time.

All of the mount/remount code has been moved into ops_fstype.c
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: cluster-devel@redhat.com
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1f52aa08

05 9月, 2019 1 次提交

gfs2: Use async glocks for rename · ad26967b

由 Bob Peterson 提交于 8月 30, 2019

Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
reverse the roles of which directories are "old" and which are "new" for
the purposes of rename. This can cause deadlocks where two nodes end up
waiting for each other.

There can be several layers of directory dependencies across many nodes.

This patch fixes the problem by acquiring all gfs2_rename's inode glocks
asychronously and waiting for all glocks to be acquired.  That way all
inodes are locked regardless of the order.

The timeout value for multiple asynchronous glocks is calculated to be
the total of the individual wait times for each glock times two.

Since gfs2_exchange is very similar to gfs2_rename, both functions are
patched in the same way.

A new async glock wait queue, sd_async_glock_wait, keeps a list of
waiters for these events. If gfs2's holder_wake function detects an
async holder, it wakes up any waiters for the event. The waiter only
tests whether any of its requests are still pending.

Since the glocks are sent to dlm asychronously, the wait function needs
to check to see which glocks, if any, were granted.

If a glock is granted by dlm (and therefore held), its minimum hold time
is checked and adjusted as necessary, as other glock grants do.

If the event times out, all glocks held thus far must be dequeued to
resolve any existing deadlocks.  Then, if there are any outstanding
locking requests, we need to loop around and wait for dlm to respond to
those requests too.  After we release all requests, we return -ESTALE to
the caller (vfs rename) which loops around and retries the request.

    Node1           Node2
    ---------       ---------
1.  Enqueue A       Enqueue B
2.  Enqueue B       Enqueue A
3.  A granted
6.                  B granted
7.  Wait for B
8.                  Wait for A
9.                  A times out (since Node 1 holds A)
10.                 Dequeue B (since it was granted)
11.                 Wait for all requests from DLM
12. B Granted (since Node2 released it in step 10)
13. Rename
14. Dequeue A
15.                 DLM Grants A
16.                 Dequeue A (due to the timeout and since we
                    no longer have B held for our task).
17. Dequeue B
18.                 Return -ESTALE to vfs
19.                 VFS retries the operation, goto step 1.

This release-all-locks / acquire-all-locks may slow rename / exchange
down as both nodes struggle in the same way and do the same thing.
However, this will only happen when there is contention for the same
inodes, which ought to be rare.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

ad26967b

28 6月, 2019 3 次提交

gfs2: dump fsid when dumping glock problems · 3792ce97

由 Bob Peterson 提交于 5月 09, 2019

Before this patch, if a glock error was encountered, the glock with
the problem was dumped. But sometimes you may have lots of file systems
mounted, and that doesn't tell you which file system it was for.

This patch adds a new boolean parameter fsid to the dump_glock family
of functions. For non-error cases, such as dumping the glocks debugfs
file, the fsid is not dumped in order to keep lock dumps and glocktop
as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
file system id is now printed. This will make it easier to debug.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

3792ce97

gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN · 04aea0ca

由 Bob Peterson 提交于 5月 07, 2019

Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

04aea0ca

gfs2: eliminate tr_num_revoke_rm · e955537e

由 Bob Peterson 提交于 3月 26, 2019

For its journal processing, gfs2 kept track of the number of buffers
added and removed on a per-transaction basis. These values are used
to calculate space needed in the journal. But while these calculations
make sense for the number of buffers, they make no sense for revokes.
Revokes are managed in their own list, linked from the superblock.
So it's entirely unnecessary to keep separate per-transaction counts
for revokes added and removed. A single count will do the same job.
Therefore, this patch combines the transaction revokes into a single
count.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

e955537e

06 6月, 2019 1 次提交

Revert "gfs2: Replace gl_revokes with a GLF flag" · 638803d4

由 Bob Peterson 提交于 6月 06, 2019

Commit 73118ca8 introduced a glock reference counting bug in
gfs2_trans_remove_revoke.  Given that, replacing gl_revokes with a GLF flag is
no longer useful, so revert that commit.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

638803d4

05 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 · 7336d0e6

由 Thomas Gleixner 提交于 5月 31, 2019

Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7336d0e6

08 5月, 2019 4 次提交

gfs2: fix race between gfs2_freeze_func and unmount · 8f918219

由 Abhi Das 提交于 4月 30, 2019

As part of the freeze operation, gfs2_freeze_func() is left blocking
on a request to hold the sd_freeze_gl in SH. This glock is held in EX
by the gfs2_freeze() code.

A subsequent call to gfs2_unfreeze() releases the EXclusively held
sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and
resume its operation.

gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete.
If a umount is issued right after unfreeze, it could result in an
inconsistent filesystem because some journal data (statfs update) isn't
written out.

Refer to commit 24972557 for a more detailed explanation of how
freeze/unfreeze work.

This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to
complete before returning to the user.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

8f918219

gfs2: Rename sd_log_le_{revoke,ordered} · a5b1d3fc

由 Andreas Gruenbacher 提交于 4月 05, 2019

Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to
sd_log_ordered: not sure what le stands for here, but it doesn't add
clarity, and if it stands for list entry, it's actually confusing as
those are both list heads but not list entries.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a5b1d3fc

gfs2: Replace gl_revokes with a GLF flag · 73118ca8

由 Bob Peterson 提交于 4月 05, 2019

The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes. However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero. Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

73118ca8

gfs2: clean_journal improperly set sd_log_flush_head · 7c70b896

由 Bob Peterson 提交于 3月 25, 2019

This patch fixes regressions in 588bff95.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, new function
gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.

This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.

Fixes: 588bff95 ("GFS2: Reduce code redundancy writing log headers")
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

7c70b896

23 1月, 2019 1 次提交

gfs: no need to check return value of debugfs_create functions · 2abbf9a4

由 Greg Kroah-Hartman 提交于 1月 22, 2019

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

There is no need to save the dentries for the debugfs files, so drop
those variables to save a bit of space and make the code simpler.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

2abbf9a4

12 12月, 2018 2 次提交

gfs2: Dump nrpages for inodes and their glocks · 27a2660f

由 Bob Peterson 提交于 4月 18, 2018

This patch is based on an idea from Steve Whitehouse. The idea is
to dump the number of pages for inodes in the glock dumps.
The additional locking required me to drop const from quite a few
places.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

27a2660f

gfs2: Remove vestigial bd_ops · cbbe76c8

由 Bob Peterson 提交于 11月 16, 2018

Field bd_ops was set but never used, so I removed it, and all
code supporting it.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

cbbe76c8

12 10月, 2018 2 次提交

gfs2: Rename bitmap.bi_{len => bytes} · 281b4952

由 Andreas Gruenbacher 提交于 9月 26, 2018

This field indicates the size of the bitmap in bytes, similar to how the
bi_blocks field indicates the size of the bitmap in blocks.

In count_unlinked, replace an instance of bi_bytes * GFS2_NBBY by
bi_blocks.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>

281b4952

gfs2: Move rs_{sizehint, rgd_gh} fields into the inode · 21f09c43

由 Andreas Gruenbacher 提交于 8月 30, 2018

Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv
into the inode: they are more closely related to the inode than to a
particular reservation.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>

21f09c43

05 10月, 2018 1 次提交

gfs2: slow the deluge of io error messages · b524abcc

由 Bob Peterson 提交于 10月 04, 2018

When an io error is hit, it calls gfs2_io_error_bh_i for every
journal buffer it can't write. Since we changed gfs2_io_error_bh_i
recently to withdraw later in the cycle, it sends a flood of
errors to the console. This patch checks for the file system already
being withdrawn, and if so, doesn't send more messages. It doesn't
stop the flood of messages, but it slows it down and keeps it more
reasonable.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b524abcc

07 8月, 2018 1 次提交

gfs2: Fix gfs2_testbit to use clone bitmaps · dffe12a8

由 Bob Peterson 提交于 8月 07, 2018

Function gfs2_testbit is called in three places. Two of those places,
gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
by the block reservations scheme to determine the length of an extent of
free blocks. Before this patch, it wasn't using the clone bitmap, which
means recently-freed blocks were treated as free blocks for the purposes
of an allocation.

This patch adds a new parameter to gfs2_testbit to indicate whether or
not the clone bitmaps should be used (if available).
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

dffe12a8

05 7月, 2018 1 次提交

gfs2: Eliminate redundant ip->i_rgd · b7eba890

由 Andreas Gruenbacher 提交于 6月 21, 2018

GFS2 remembers the last rgrp used for allocations in ip->i_rgd.
However, block allocations are made by way of a reservations structure,
ip->i_res, which keeps the last rgrp in ip->i_res.rs_rgd, and ip->i_res
is kept in sync with ip->i_res.rs_rgd, so it's redundant.  Get rid of
ip->i_rgd and just use ip->i_res.rs_rgd in its place.

Based on patches by Robert Peterson.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b7eba890

21 6月, 2018 1 次提交

gfs2: eliminate rs_inum and reduce the size of gfs2 inodes · f85c10e2

由 Bob Peterson 提交于 6月 13, 2018

Before this patch, block reservations kept track of the inode
number. At one point, that was a valid thing to do. However, since
we made the reservation a part of the inode (rather than a pointer
to a separate allocated object) the reservation can determine the
inode number by using container_of. This saves us a little memory
in our inode.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

f85c10e2

04 6月, 2018 1 次提交

GFS2: gfs2_free_extlen can return an extent that is too long · dc8fbb03

由 Bob Peterson 提交于 6月 01, 2018

Function gfs2_free_extlen calculates the length of an extent of
free blocks that may be reserved. The end pointer was calculated as
end = start + bh->b_size but b_size is incorrect because the
bitmap usually stops prior to the end of the buffer data on
the last bitmap.

What this means is that when you do a write, you can reserve a
chunk of blocks that runs off the end of the last bitmap. For
example, I've got a file system where there is only one bitmap
for each rgrp, so ri_length==1. I saw cases in which iozone
tried to do a big write, grabbed a large block reservation,
chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188.
So 5464153 + 8188 = 5472341 which is the end of the rgrp.

When it grabbed a reservation it got back: 5470936, length 7229.
But 5470936 + 7229 = 5478165. So the reservation starts inside
the rgrp but runs 5824 blocks past the end of the bitmap.

This patch fixes the calculation so it won't exceed the last
bitmap. It also adds a BUG_ON to guard against overflows in the
future.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

dc8fbb03

17 4月, 2018 1 次提交

gfs2: Remove sdp->sd_jheightsize · 9a38662b

由 Andreas Gruenbacher 提交于 4月 16, 2018

GFS2 keeps two arrarys in the superblock that define the maximum size of
an inode depending on the inode's height: sdp->sd_heightsize defines the
heights in units of sb->s_blocksize; sdp->sd_jheightsize defines them in
units of sb->s_blocksize - sizeof(struct gfs2_meta_header).  These
arrays are used to determine when additional layers of indirect blocks
are needed.  The second array is used for directories which have an
additional gfs2_meta_header at the beginning of each block.

Distinguishing between these two cases makes no sense: the height
required for representing N blocks will come out the same no matter if
the calculation is done in gross (sb->s_blocksize) or net
(sb->s_blocksize - sizeof(struct gfs2_meta_header)) units.

Stuffed directories don't have an additional gfs2_meta_header, but the
stuffed case is handled separately for both files and directories,
anyway.

Remove the unncessary sdp->sd_jheightsize array.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

9a38662b

29 3月, 2018 1 次提交

gfs2: Zero out fallocated blocks in fallocate_chunk · fffb6412

由 Andreas Gruenbacher 提交于 3月 29, 2018

Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
out in fallocate_chunk, much higher up the call stack. This gets rid of
gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
buffer flag. I can't think of a reason why zeroing out the blocks in
gfs2_iomap_alloc would have any benefits: there is no additional locking
at that level that would add protection to the newly allocated blocks.

While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

fffb6412

22 1月, 2018 1 次提交

gfs2: Get rid of gfs2_log_header_in · 0ff5916a

由 Andreas Gruenbacher 提交于 1月 16, 2018

Get rid of gfs2_log_header_in by integrating it into get_log_header.
Clean up the crc32 computations and use the same functions for encoding
and decoding to make things less confusing.  Eliminate lh_hash from
gfs2_log_header_host which is completely useless.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

0ff5916a

19 1月, 2018 1 次提交

gfs2: Add gfs2_max_stuffed_size · 235628c5

由 Andreas Gruenbacher 提交于 11月 14, 2017

Add a small inline function for computing the maximum size of a stuffed
inode instead of open coding that in several places throughout the code.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

235628c5

25 8月, 2017 2 次提交

gfs2: Silence gcc format-truncation warning · 561b7969

由 Andreas Gruenbacher 提交于 8月 22, 2017

Enlarge sd_fsname to be big enough for the longest long lock table name
and an arbitrary journal number.  This silences two -Wformat-truncation
warnings with gcc 7.1.1.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

561b7969

GFS2: Withdraw for IO errors writing to the journal or statfs · 942b0cdd

由 Bob Peterson 提交于 8月 16, 2017

Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.

This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.

These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.

Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66f.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

942b0cdd

10 8月, 2017 1 次提交

gfs2: forcibly flush ail to relieve memory pressure · b066a4ee

由 Abhi Das 提交于 8月 04, 2017

On systems with low memory, it is possible for gfs2 to infinitely
loop in balance_dirty_pages() under heavy IO (creating sparse files).

balance_dirty_pages() attempts to write out the dirty pages via
gfs2_writepages() but none are found because these dirty pages are
being used by the journaling code in the ail. Normally, the journal
has an upper threshold which when hit triggers an automatic flush
of the ail. But this threshold can be higher than the number of
allowable dirty pages and result in the ail never being flushed.

This patch forces an ail flush when gfs2_writepages() fails to write
anything. This is a good indication that the ail might be holding
some dirty pages.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b066a4ee

08 7月, 2017 1 次提交

gfs2: Fix glock rhashtable rcu bug · 961ae1d8

由 Andreas Gruenbacher 提交于 7月 07, 2017

Before commit 88ffbf3e "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu.  This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe.  Bring back the original
code for freeing glocks via call_rcu.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # 4.3+

961ae1d8

05 7月, 2017 2 次提交

gfs2: Protect gl->gl_object by spin lock · 6f6597ba

由 Andreas Gruenbacher 提交于 6月 30, 2017

Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

6f6597ba

gfs2: Get rid of flush_delayed_work in gfs2_evict_inode · 4fd1a579

由 Andreas Gruenbacher 提交于 6月 30, 2017

So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
work queue to make sure that inode glops which dereference gl->gl_object
have finished running before the inode is destroyed.  However, flushing
the work queue may do more work than needed, and in particular, it may
call into DLM, which we want to avoid here.  Use a bit lock
(GIF_GLOP_PENDING) to synchronize between the inode glops and
gfs2_evict_inode instead to get rid of the flushing.

In addition, flush the work queues of existing glocks before reusing
them for new inodes to get those glocks into a known state: the glock
state engine currently doesn't handle glock re-appropriation correctly.
(We may be able to fix the glock state engine instead later.)

Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

4fd1a579

20 6月, 2017 1 次提交

GFS2: Eliminate vestigial sd_log_flush_wrapped · 722f6f62

由 Bob Peterson 提交于 6月 20, 2017

Superblock variable sd_log_flush_wrapped is set, but never referenced,
so this patch eliminates it.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

722f6f62

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功