提交 · 61b91cfdc6c0c49a8cc8258cbee846551029d694 · openeuler / Kernel

09 8月, 2017 6 次提交

由 Andreas Gruenbacher 提交于 8月 01, 2017

Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

61b91cfd

GFS2: Delete debugfs files only after we evict the glocks · b2fb7dab

由 Bob Peterson 提交于 7月 28, 2017

This patch moves the call to gfs2_delete_debugfs_file so that it
comes after the glock hash table has been cleared. This way we
can query the debugfs files if umount hangs.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b2fb7dab

GFS2: Don't waste time locking lru_lock for non-lru glocks · 645ebd49

由 Bob Peterson 提交于 7月 26, 2017

Before this patch, glock_dq would call gfs2_glock_remove_from_lru.
For glocks that are never put on the LRU, such as the transaction
glock, this just takes the spin_lock, determines there's nothing to
be done because the list is empty, then unlocks again. This was
causing unnecessary lock contention on the lru_lock spin_lock.
This patch adds a check for GLOF_LRU in the glops before taking
the spin_lock.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

645ebd49

GFS2: Don't bother trying to add rgrps to the lru list · 2d821a8b

由 Bob Peterson 提交于 7月 26, 2017

This patch removes a call to gfs2_glock_add_to_lru from function
gfs2_clear_rgrpd. The call is just a waste of time because as soon
as it adds it to the lru_list, the call to gfs2_glock_put takes it
back off again.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

2d821a8b

GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode · 240c6235

由 Bob Peterson 提交于 7月 18, 2017

This patch adds some calls to clear gl_object in function
gfs2_delete_inode. Since we are deleting the inode, and the glock
typically outlives the inode in core, we must clear gl_object
so subsequent use of the glock (e.g. for a new inode in its place)
will not have the old pointer sitting there. In error cases we
need to tidy up after ourselves. In non-error cases, we need to
clear gl_object before we set the block free in the bitmap so
residules aren't left for potential inode creators.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

240c6235

GFS2: Clear gl_object if gfs2_create_inode fails · 9c1b2808

由 Bob Peterson 提交于 7月 18, 2017

If function gfs2_create_inode fails after the inode has been
created (for example, if the inode_refresh fails for some reason)
the function was setting gl_object but never clearing it again.
The glocks are left pointing to a freed inode. This patch adds
the calls to clear gl_object in the appropriate error paths.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

9c1b2808

21 7月, 2017 4 次提交

GFS2: Set gl_object in inode lookup only after block type check · 4d7c18c7

由 Bob Peterson 提交于 7月 18, 2017

Before this patch, the inode glock's gl_object was set after a
reference was acquired, but before the block type was verified.
In cases where the block was unlinked, then freed and reused on
another node, a residule delete callback (delete_work) would try
to look up the inode, eventually failing the block check, but
only after it overwrites gl_object with a pointer to the wrong
inode. This patch moves the assignment of gl_object after the
block check so it won't be improperly overwritten.

Likewise, at the end of the function, gfs2_inode_lookup was
clearing gl_object after it unlocked the glock, which meant
another process might free the glock in the meantime. This
patch guards against that case.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

4d7c18c7

GFS2: Introduce helper for clearing gl_object · df3d87bd

由 Bob Peterson 提交于 7月 18, 2017

This patch introduces a new helper function in glock.h that
clears gl_object, with an added integrity check. An additional
integrity check has been added to glock_set_object, plus comments.
This is step 1 in a series to ensure gl_object integrity.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

df3d87bd

gfs2: add flag REQ_PRIO for metadata I/O · e477b24b

由 Coly Li 提交于 7月 21, 2017

When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
the bio. But flag REQ_META is just a hint for block trace, not for block
layer code to handle a bio as metadata request.

For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
would be very informative to block layer code. For example, if bcache is
used as a I/O cache for gfs2, it will be possible for bcache code to get
the hint and cache the pre-fetched metadata blocks on cache device. This
behavior may be helpful to improve metadata I/O performance if the
following requests hit the cache.

Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
- All places where REQ_READAHEAD is used, gfs2 code uses this flag for
  metadata read ahead.
- In gfs2_meta_rq() where the first metadata block is read in.
- In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
  up to date.
These metadata blocks are probably to be accessed again in future, adding
a REQ_PRIO flag may have bcache to keep such metadata in fast cache
device. For system without a cache layer, REQ_PRIO can still provide hint
to block layer to handle metadata requests more properly.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

e477b24b

GFS2: fix code parameter error in inode_go_lock · e7cb550d

由 Wang Xibo 提交于 7月 21, 2017

In inode_go_lock() function, the parameter order of list_add() is error.
According to the define of list_add(), the first parameter is new entry
and the second is the list head, so ip->i_trunc_list should be the
first parameter and the sdp->sd_trunc_list should be second.

Signed-off-by: Wang Xibo<wang.xibo@zte.com.cn>
Signed-off-by: Xiao Likun<xiao.likun@zte.com.cn>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

e7cb550d

20 7月, 2017 1 次提交

gfs2: Fixup to "Get rid of flush_delayed_work in gfs2_evict_inode" · 98e5a91a

由 Andreas Gruenbacher 提交于 7月 19, 2017

When commit 4fd1a579 moved the call to flush_delayed_work from
gfs2_evict_inode to gfs2_inode_lookup to avoid calling into DLM during
evict, a similar call should have been added to gfs2_create_inode:
that's another code path in which glocks of previous inodes may be
reused.

The flush of the iopen glock work queue added by 4fd1a579, on the
other hand, is unnecessary and can be removed.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

98e5a91a

19 7月, 2017 1 次提交

gfs2: Don't clear SGID when inheriting ACLs · 914cea93

由 Jan Kara 提交于 7月 19, 2017

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__gfs2_set_acl() into gfs2_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 07393101Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

914cea93

18 7月, 2017 1 次提交

gfs2: Lock holder cleanup (fixup) · 283c9a97

由 Andreas Gruenbacher 提交于 7月 17, 2017

Function gfs2_holder_initialized should be used in do_flock as well.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

283c9a97

17 7月, 2017 1 次提交

GFS2: Prevent double brelse in gfs2_meta_indirect_buffer · 61eaadcd

由 Bob Peterson 提交于 7月 03, 2017

Before this patch, problems reading in indirect buffers would send
an IO error back to the caller, and release the buffer_head with
brelse() in function gfs2_meta_indirect_buffer, however, it would
still return the address of the buffer_head it released. After the
error was discovered, function gfs2_block_map would call function
release_metapath to free all buffers. That checked:
if (mp->mp_bh[i] == NULL) but since the value was set after the
error, it was non-zero, so brelse was called a second time. This
resulted in the following error:

kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G        W  -- ------------   )
kernel: Hardware name: RHEV Hypervisor
kernel: VFS: brelse: Trying to free free buffer

This patch changes gfs2_meta_indirect_buffer so it only sets
the buffer_head pointer in cases where it isn't released.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

61eaadcd

08 7月, 2017 1 次提交

gfs2: Fix glock rhashtable rcu bug · 961ae1d8

由 Andreas Gruenbacher 提交于 7月 07, 2017

Before commit 88ffbf3e "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu.  This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe.  Bring back the original
code for freeing glocks via call_rcu.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # 4.3+

961ae1d8

06 7月, 2017 1 次提交

buffer: set errors in mapping at the time that the error occurs · 87354e5d

由 Jeff Layton 提交于 7月 06, 2017

I noticed on xfs that I could still sometimes get back an error on fsync
on a fd that was opened after the error condition had been cleared.

The problem is that the buffer code sets the write_io_error flag and
then later checks that flag to set the error in the mapping. That flag
perisists for quite a while however. If the file is later opened with
O_TRUNC, the buffers will then be invalidated and the mapping's error
set such that a subsequent fsync will return error. I think this is
incorrect, as there was no writeback between the open and fsync.

Add a new mark_buffer_write_io_error operation that sets the flag and
the error in the mapping at the same time. Replace all calls to
set_buffer_write_io_error with mark_buffer_write_io_error, and remove
the places that check this flag in order to set the error in the
mapping.

This sets the error in the mapping earlier, at the time that it's first
detected.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>

87354e5d

05 7月, 2017 5 次提交

GFS2: constify attribute_group structures. · 29695254

由 Arvind Yadav 提交于 6月 30, 2017

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   5259	   1344	      8	   6611	   19d3	fs/gfs2/sys.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   5371	   1216	      8	   6595	   19c3	fs/gfs2/sys.o
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

29695254

gfs2: gfs2_create_inode: Keep glock across iput · e0b62e21

由 Andreas Gruenbacher 提交于 6月 30, 2017

On failure, keep the inode glock across the final iput of the new inode
so that gfs2_evict_inode doesn't have to re-acquire the glock. That
way, gfs2_evict_inode won't need to revalidate the block type.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

e0b62e21

gfs2: Clean up glock work enqueuing · 6b0c7440

由 Andreas Gruenbacher 提交于 6月 30, 2017

This patch adds a standardized queueing mechanism for glock work
with spin_lock protection to prevent races.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

6b0c7440

gfs2: Protect gl->gl_object by spin lock · 6f6597ba

由 Andreas Gruenbacher 提交于 6月 30, 2017

Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

6f6597ba

gfs2: Get rid of flush_delayed_work in gfs2_evict_inode · 4fd1a579

由 Andreas Gruenbacher 提交于 6月 30, 2017

So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
work queue to make sure that inode glops which dereference gl->gl_object
have finished running before the inode is destroyed.  However, flushing
the work queue may do more work than needed, and in particular, it may
call into DLM, which we want to avoid here.  Use a bit lock
(GIF_GLOP_PENDING) to synchronize between the inode glops and
gfs2_evict_inode instead to get rid of the flushing.

In addition, flush the work queues of existing glocks before reusing
them for new inodes to get those glocks into a known state: the glock
state engine currently doesn't handle glock re-appropriation correctly.
(We may be able to fix the glock state engine instead later.)

Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

4fd1a579

20 6月, 2017 1 次提交

GFS2: Eliminate vestigial sd_log_flush_wrapped · 722f6f62

由 Bob Peterson 提交于 6月 20, 2017

Superblock variable sd_log_flush_wrapped is set, but never referenced,
so this patch eliminates it.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

722f6f62

13 6月, 2017 2 次提交

GFS2: Remove gl_list from glock structure · df68f20f

由 Bob Peterson 提交于 6月 05, 2017

The gl_list is no longer used nor needed in the glock structure,
so this patch eliminates it.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

df68f20f

GFS2: Withdraw when directory entry inconsistencies are detected · d87d62b7

由 Bob Peterson 提交于 5月 26, 2017

This patch prints an inode consistency error and withdraws the file
system when directory entry counts are mismatched.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

d87d62b7

09 6月, 2017 2 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

gfs2: remove the unused sd_log_error field · f729b66f

由 Christoph Hellwig 提交于 6月 03, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f729b66f

05 6月, 2017 1 次提交

fs: switch ->s_uuid to uuid_t · 85787090

由 Christoph Hellwig 提交于 5月 10, 2017

For some file systems we still memcpy into it, but in various places this
already allows us to use the proper uuid helpers. More to come..
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM)
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

85787090

24 5月, 2017 1 次提交

gfs2: Make flush bios explicitely sync · 0f0b9b63

由 Jan Kara 提交于 5月 02, 2017

Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d6
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
CC: stable@vger.kernel.org
Acked-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

0f0b9b63

09 5月, 2017 1 次提交

gfs2: replace CURRENT_TIME with current_time · b32c8c76

由 Stephen Rothwell 提交于 5月 08, 2017

Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.auSigned-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b32c8c76

06 5月, 2017 1 次提交

GFS2: Allow glocks to be unlocked after withdraw · ed17545d

由 Bob Peterson 提交于 5月 05, 2017

This bug fixes a regression introduced by patch 0d1c7ae9.

The intent of the patch was to stop promoting glocks after a
file system is withdrawn due to a variety of errors, because doing
so results in a BUG(). (You should be able to unmount after a
withdraw rather than having the kernel panic.)

Unfortunately, it also stopped demotions, so glocks could not be
unlocked after withdraw, which means the unmount would hang.

This patch allows function do_xmote to demote locks to an
unlocked state after a withdraw, but not promote them.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

ed17545d

21 4月, 2017 2 次提交

fs: Remove SB_I_DYNBDI flag · c1844d53

由 Jan Kara 提交于 4月 12, 2017

Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

c1844d53

gfs2: Convert to properly refcounting bdi · 95fe66de

由 Jan Kara 提交于 4月 12, 2017

Similarly to set_bdev_super() GFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.

CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Bob Peterson <rpeterso@redhat.com>
CC: cluster-devel@redhat.com
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

95fe66de

19 4月, 2017 1 次提交

GFS2: Non-recursive delete · d552a2b9

由 Bob Peterson 提交于 2月 06, 2017

Implement truncate/delete as a non-recursive algorithm. The older
algorithm was implemented with recursion to strip off each layer
at a time (going by height, starting with the maximum height.
This version tries to do the same thing but without recursion,
and without needing to allocate new structures or lists in memory.

For example, say you want to truncate a very large file to 1 byte,
and its end-of-file metapath is: 0.505.463.428. The starting
metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
needs to preserve that byte, and all metadata pointing to it.
So it would start at 0.0.0.0, look up all its metadata buffers,
then free all data blocks pointed to at the highest level.
After that buffer is "swept", it moves on to 0.0.0.1, then
0.0.0.2, etc., reading in buffers and sweeping them clean.
When it gets to the end of the 0.0.0 metadata buffer (for 4K
blocks the last valid one is 0.0.0.508), it backs up to the
previous height and starts working on 0.0.1.0, then 0.0.1.1,
and so forth. After it reaches the end and sweeps 0.0.1.508,
it continues with 0.0.2.0, and so on. When that height is
exhausted, and it reaches 0.0.508.508 it backs up another level,
to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
marching backwards and forwards through the metadata until it's
all swept clean. Once it has all the data blocks freed, it
lowers the strip height, and begins the process all over again,
but with one less height. This time it sweeps 0.0.0 through
0.505.463. When that's clean, it lowers the strip height again
and works to free 0.505. Eventually it strips the lowest height, 0.
For a delete or truncate to 0, all metadata for all heights of
0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
be preserved.

This isn't much different from normal integer incrementing,
where an integer gets incremented from 0000 (0.0.0.0) to 3021
(3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
just that each "digit" goes from 0 to 508 (for a total of 509
pointers) rather than from 0 to 9.

Note that the dinode will only have 483 pointers due to the
dinode structure itself.

Also note: this is just an example. These numbers (509 and 483)
are based on a standard 4K block size. Smaller block sizes will
yield smaller numbers of indirect pointers accordingly.

The truncation process is accomplished with the help of two
major functions and a few helper functions.

Functions do_strip and recursive_scan are obsolete, so removed.

New function sweep_bh_for_rgrps cleans a buffer_head pointed to
by the given metapath and height. By cleaning, I mean it frees
all blocks starting at the offset passed in metapath. It starts
at the first block in the buffer pointed to by the metapath and
identifies its resource group (rgrp). From there it frees all
subsequent block pointers that lie within that rgrp. If it's
already inside a transaction, it stays within it as long as it
can. In other words, it doesn't close a transaction until it knows
it's freed what it can from the resource group. In this way,
multiple buffers may be cleaned in a single transaction, as long
as those blocks in the buffer all lie within the same rgrp.

If it's not in a transaction, it starts one. If the buffer_head
has references to blocks within multiple rgrps, it frees all the
blocks inside the first rgrp it finds, then closes the
transaction. Then it repeats the cycle: identifies the next
unfreed block, uses it to find its rgrp, then starts a new
transaction for that set. It repeats this process repeatedly
until the buffer_head contains no more references to any blocks
past the given metapath.

Function trunc_dealloc has been reworked into a finite state
automaton. It has basically 3 active states:
DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:

The DEALLOC_MP_FULL state implies the metapath has a full set
of buffers out to the "shrink height", and therefore, it can
call function sweep_bh_for_rgrps to free the blocks within the
highest height of the metapath. If it's just swept the lowest
level (or an error has occurred) the state machine is ended.
Otherwise it proceeds to the DEALLOC_MP_LOWER state.

The DEALLOC_MP_LOWER state implies we are finished with a given
buffer_head, which may now be released, and therefore we are
then missing some buffer information from the metapath. So we
need to find more buffers to read in. In most cases, this is
just a matter of releasing the buffer_head and moving to the
next pointer from the previous height, so it may be read in and
swept as well. If it can't find another non-null pointer to
process, it checks whether it's reached the end of a height
and needs to lower the strip height, or whether it still needs
move forward through the previous height's metadata. In this
state, all zero-pointers are skipped. From this state, it can
only loop around (once more backing up another height) or,
once a valid metapath is found (one that has non-zero
pointers), proceed to state DEALLOC_FILL_MP.

The DEALLOC_FILL_MP state implies that we have a metapath
but not all its buffers are read in. So we must proceed to read
in buffer_heads until the metapath has a valid buffer for every
height. If the previous state backed us up 3 heights, we may
need to read in a buffer, increment the height, then repeat the
process until buffers have been read in for all required heights.
If it's successful reading a buffer, and it's at the highest
height we need, it proceeds back to the DEALLOC_MP_FULL state.
If it's unable to fill in a buffer, (encounters a hole, etc.)
it tries to find another non-zero block pointer. If they're all
zero, it lowers the height and returns to the DEALLOC_MP_LOWER
state. If it finds a good non-null pointer, it loops around and
reads it in, while keeping the metapath in lock-step with the
pointers it examines.

The state machine runs until the truncation request is
satisfied. Then any transactions are ended, the quota and
statfs data are updated, and the function is complete.

Helper function metaptr1 was introduced to be an easy way to
determine the start of a buffer_head's indirect pointers.

Helper function lookup_mp_height was introduced to find a
metapath index and read in the buffer that corresponds to it.
In this way, function lookup_metapath becomes a simple loop to
call it for every height.

Helper function fillup_metapath is similar to lookup_metapath
except it can do partial lookups. If the state machine
backed up multiple levels (like 2999 wrapping to 3000) it
needs to find out the next starting point and start issuing
metadata reads at that point.

Helper function hptrs is a shortcut to determine how many
pointers should be expected in a buffer. Height 0 is the dinode
which has fewer pointers than the others.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

d552a2b9

05 4月, 2017 1 次提交

gfs2: Re-enable fallocate for the rindex · d4d7fc12

由 Andrew Price 提交于 4月 05, 2017

Commit 86066914 "gfs2: Don't support
fallocate on jdata files" removed the ability of gfs2_grow to reserve
space at the end of the rindex, which could prevent a second gfs2_grow
from succeeding if the fs is full. Allow fallocate to work on the rindex
once again.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

d4d7fc12

03 4月, 2017 2 次提交

Revert "GFS2: Wait for iopen glock dequeues" · d4da3198

由 Andreas Gruenbacher 提交于 2月 22, 2017

Revert commit 86d067a7: it turns out
that waiting for iopen glock dequeues here isn't needed anymore because
the bugs that commit was meant to fix have been fixed otherwise.

In addition, we want to avoid waiting on glocks in gfs2_evict_inode in
shrinker context because the shrinker may be invoked on behalf of DLM,
in which case calling into DLM again would deadlock.  This commit makes
the described scenario less likely without completely avoiding it; it's
still a step in the right direction, though.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

d4da3198

gfs2: Switch to rhashtable_lookup_get_insert_fast · 0a52aba7

由 Andreas Gruenbacher 提交于 2月 21, 2017

Switch from rhashtable_lookup_insert_fast + rhashtable_lookup_fast to
rhashtable_lookup_get_insert_fast, which is cleaner and avoids an extra
rhashtable lookup.

At the same time, turn the retry loop in gfs2_glock_get into an infinite
loop.  The lookup or insert will eventually succeed, usually very fast,
but there is no reason to give up trying at a fixed number of
iterations.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

0a52aba7

17 3月, 2017 1 次提交

GFS2: Temporarily zero i_no_addr when creating a dinode · cc963a11

由 Bob Peterson 提交于 3月 16, 2017

Before this patch i_no_addr was not initialized until after the
return from allocating its block. That meant the i_no_addr was
temporarily uninitialized storage. Ordinarily that's not a concern,
but if inplace_reserve can't find space, it can call try_rgrp_unlink
which references i_no_addr as a block to avoid. That can result in
unpredictable behavior. More importantly, the trace point in
gfs2_alloc_blocks references ip->i_no_addr before it is set, which
is misleading when reading the kernel traces. This patch makes it
look like the new dinode block was assigned in the name of inode 0
rather than a random inode that's completely unrelated.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

cc963a11

16 3月, 2017 3 次提交

gfs2: Don't pack struct lm_lockname · 972b044e

由 Andreas Gruenbacher 提交于 3月 16, 2017

As per a suggestion by Linus, don't pack struct lm_lockname: we did that
because the struct is used as a rhashtable key, but packing tells the
compiler that the 64-bit fields in the struct may be unaligned, causing
it to generate worse code on some architectures. Instead, rearrange the
fields in the struct so that there is no padding between fields, and
exclude any tail padding from the hash key size.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

972b044e

gfs2: Deduplicate gfs2_{glocks,glstats}_open · 92ecd73a

由 Andreas Gruenbacher 提交于 3月 09, 2017

Both functions are identical except for the seq_operations used.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

92ecd73a

gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter · cc37a627

由 Andreas Gruenbacher 提交于 3月 09, 2017

Function rhashtable_walk_init is deprecated.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

cc37a627

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功