提交 · f0dc7f9c6dd99891611fca5849cbc4c6965b690e · openeuler / Kernel

04 6月, 2018 1 次提交

GFS2: gfs2_free_extlen can return an extent that is too long · dc8fbb03

由 Bob Peterson 提交于 6月 01, 2018

Function gfs2_free_extlen calculates the length of an extent of
free blocks that may be reserved. The end pointer was calculated as
end = start + bh->b_size but b_size is incorrect because the
bitmap usually stops prior to the end of the buffer data on
the last bitmap.

What this means is that when you do a write, you can reserve a
chunk of blocks that runs off the end of the last bitmap. For
example, I've got a file system where there is only one bitmap
for each rgrp, so ri_length==1. I saw cases in which iozone
tried to do a big write, grabbed a large block reservation,
chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188.
So 5464153 + 8188 = 5472341 which is the end of the rgrp.

When it grabbed a reservation it got back: 5470936, length 7229.
But 5470936 + 7229 = 5478165. So the reservation starts inside
the rgrp but runs 5824 blocks past the end of the bitmap.

This patch fixes the calculation so it won't exceed the last
bitmap. It also adds a BUG_ON to guard against overflows in the
future.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

dc8fbb03

31 1月, 2018 1 次提交

gfs2: Add a few missing newlines in messages · af38816e

由 Andreas Gruenbacher 提交于 1月 30, 2018

Some of the info, warning, and error messages are missing their trailing
newline.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

af38816e

23 1月, 2018 2 次提交

GFS2: Log the reason for log flushes in every log header · 805c0907

由 Bob Peterson 提交于 1月 08, 2018

This patch just adds the capability for GFS2 to track which function
called gfs2_log_flush. This should make it easier to diagnose
problems based on the sequence of events found in the journals.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

805c0907

GFS2: Introduce new gfs2_log_header_v2 · c1696fb8

由 Bob Peterson 提交于 1月 17, 2018

This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible).  Some of these are used for
debug purposes so we can backtrack when problems occur.  Others are
reserved for future expansion.

This patch is based on a prototype from Steve Whitehouse.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

c1696fb8

17 1月, 2018 1 次提交

gfs2: Add gfs2_blk2rgrpd comment and fix incorrect use · 90bcab99

由 Steven Whitehouse 提交于 12月 22, 2017

Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching.  Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

90bcab99

13 12月, 2017 3 次提交

gfs2: Add a crc field to resource group headers · 850d2d91

由 Andrew Price 提交于 12月 12, 2017

Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This
allows us to check resource group headers' integrity and removes the
requirement to check them against the rindex entries in fsck. If this
field is found to be zero, it should be ignored (or updated with an
accurate value).
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

850d2d91

gfs2: Add rindex fields to rgrp headers · 166725d9

由 Andrew Price 提交于 12月 12, 2017

Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields
are identical to their counterparts in struct gfs2_rindex and are
intended to reduce the use of the rindex. For now the fields are only
written back as the in-memory equivalents in struct gfs2_rgrpd are set
using values from the rindex. However, they are needed at this point so
that userspace can make use of them, allowing a migration away from the
rindex over time.

The new fields take up previously reserved space which was explicitly
zeroed on write so, in clusters with mixed kernels, these fields could
get zeroed after being set and this should not be treated as an error.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

166725d9

gfs2: Add a next-resource-group pointer to resource groups · 65adc273

由 Andrew Price 提交于 12月 12, 2017

Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
rg_skip field has the following meaning:

- If rg_skip is zero, it is considered unset and not useful.
- If rg_skip is non-zero, its value will be the number of blocks between
  this rgrp's address and the next rgrp's address. This can be used as a
  hint by fsck.gfs2 when rebuilding a bad rindex, for example.

This will provide less dependency on the rindex in future, and allow
tools such as fsck.gfs2 to iterate the resource groups without keeping
the rindex around.

The field is updated in gfs2_rgrp_out() so that existing file systems
will have it set. This means that any resource groups that aren't ever
written will not be updated. The final rgrp is a special case as there
is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is
extended).

Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
the rg_skip field can get set back to 0 in cases where nodes with and
without this patch are mixed in a cluster. In some cases, the field may
bounce between being set by one node and then zeroed by another which
may harm performance slightly, e.g. when two nodes create many small
files. In testing this situation is rare but it becomes more likely as
the filesystem fills up and there are fewer resource groups to choose
from. The problem goes away when all nodes are running with this patch.
Dipping into the space currently occupied by the rg_reserved field would
have resulted in the same problem as it is also explicitly zeroed, so
unfortunately there is no other way around it.
Signed-off-by: NAndrew Price <anprice@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

65adc273

28 11月, 2017 1 次提交

GFS2: Combine gfs2_free_di with gfs2_free_uninit_di · a18c78c5

由 Bob Peterson 提交于 11月 22, 2017

Before this patch, function gfs2_free_di was 4 lines of code, and
one of those lines was to call gfs2_free_uninit_di. Although
unlikely, if function gfs2_free_uninit_di encountered an error
finding the block to be freed, the error was silently ignored by the
caller, which went ahead and improperly did a quota-change operation
and meta_wipe despite the error. This patch combines the two
functions into one to make the code more readable and fixes the bug
by returning from the combined function before it takes those next
incorrect steps.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

a18c78c5

30 8月, 2017 1 次提交

GFS2: Fix gl_object warnings · 7023a0b1

由 Andreas Gruenbacher 提交于 8月 30, 2017

The following cleanup is needed to avoid spilling the syslog with
false warnings.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

7023a0b1

09 8月, 2017 1 次提交

GFS2: Don't bother trying to add rgrps to the lru list · 2d821a8b

由 Bob Peterson 提交于 7月 26, 2017

This patch removes a call to gfs2_glock_add_to_lru from function
gfs2_clear_rgrpd. The call is just a waste of time because as soon
as it adds it to the lru_list, the call to gfs2_glock_put takes it
back off again.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

2d821a8b

05 7月, 2017 1 次提交

gfs2: Protect gl->gl_object by spin lock · 6f6597ba

由 Andreas Gruenbacher 提交于 6月 30, 2017

Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

6f6597ba

19 4月, 2017 1 次提交

GFS2: Non-recursive delete · d552a2b9

由 Bob Peterson 提交于 2月 06, 2017

Implement truncate/delete as a non-recursive algorithm. The older
algorithm was implemented with recursion to strip off each layer
at a time (going by height, starting with the maximum height.
This version tries to do the same thing but without recursion,
and without needing to allocate new structures or lists in memory.

For example, say you want to truncate a very large file to 1 byte,
and its end-of-file metapath is: 0.505.463.428. The starting
metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
needs to preserve that byte, and all metadata pointing to it.
So it would start at 0.0.0.0, look up all its metadata buffers,
then free all data blocks pointed to at the highest level.
After that buffer is "swept", it moves on to 0.0.0.1, then
0.0.0.2, etc., reading in buffers and sweeping them clean.
When it gets to the end of the 0.0.0 metadata buffer (for 4K
blocks the last valid one is 0.0.0.508), it backs up to the
previous height and starts working on 0.0.1.0, then 0.0.1.1,
and so forth. After it reaches the end and sweeps 0.0.1.508,
it continues with 0.0.2.0, and so on. When that height is
exhausted, and it reaches 0.0.508.508 it backs up another level,
to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
marching backwards and forwards through the metadata until it's
all swept clean. Once it has all the data blocks freed, it
lowers the strip height, and begins the process all over again,
but with one less height. This time it sweeps 0.0.0 through
0.505.463. When that's clean, it lowers the strip height again
and works to free 0.505. Eventually it strips the lowest height, 0.
For a delete or truncate to 0, all metadata for all heights of
0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
be preserved.

This isn't much different from normal integer incrementing,
where an integer gets incremented from 0000 (0.0.0.0) to 3021
(3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
just that each "digit" goes from 0 to 508 (for a total of 509
pointers) rather than from 0 to 9.

Note that the dinode will only have 483 pointers due to the
dinode structure itself.

Also note: this is just an example. These numbers (509 and 483)
are based on a standard 4K block size. Smaller block sizes will
yield smaller numbers of indirect pointers accordingly.

The truncation process is accomplished with the help of two
major functions and a few helper functions.

Functions do_strip and recursive_scan are obsolete, so removed.

New function sweep_bh_for_rgrps cleans a buffer_head pointed to
by the given metapath and height. By cleaning, I mean it frees
all blocks starting at the offset passed in metapath. It starts
at the first block in the buffer pointed to by the metapath and
identifies its resource group (rgrp). From there it frees all
subsequent block pointers that lie within that rgrp. If it's
already inside a transaction, it stays within it as long as it
can. In other words, it doesn't close a transaction until it knows
it's freed what it can from the resource group. In this way,
multiple buffers may be cleaned in a single transaction, as long
as those blocks in the buffer all lie within the same rgrp.

If it's not in a transaction, it starts one. If the buffer_head
has references to blocks within multiple rgrps, it frees all the
blocks inside the first rgrp it finds, then closes the
transaction. Then it repeats the cycle: identifies the next
unfreed block, uses it to find its rgrp, then starts a new
transaction for that set. It repeats this process repeatedly
until the buffer_head contains no more references to any blocks
past the given metapath.

Function trunc_dealloc has been reworked into a finite state
automaton. It has basically 3 active states:
DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:

The DEALLOC_MP_FULL state implies the metapath has a full set
of buffers out to the "shrink height", and therefore, it can
call function sweep_bh_for_rgrps to free the blocks within the
highest height of the metapath. If it's just swept the lowest
level (or an error has occurred) the state machine is ended.
Otherwise it proceeds to the DEALLOC_MP_LOWER state.

The DEALLOC_MP_LOWER state implies we are finished with a given
buffer_head, which may now be released, and therefore we are
then missing some buffer information from the metapath. So we
need to find more buffers to read in. In most cases, this is
just a matter of releasing the buffer_head and moving to the
next pointer from the previous height, so it may be read in and
swept as well. If it can't find another non-null pointer to
process, it checks whether it's reached the end of a height
and needs to lower the strip height, or whether it still needs
move forward through the previous height's metadata. In this
state, all zero-pointers are skipped. From this state, it can
only loop around (once more backing up another height) or,
once a valid metapath is found (one that has non-zero
pointers), proceed to state DEALLOC_FILL_MP.

The DEALLOC_FILL_MP state implies that we have a metapath
but not all its buffers are read in. So we must proceed to read
in buffer_heads until the metapath has a valid buffer for every
height. If the previous state backed us up 3 heights, we may
need to read in a buffer, increment the height, then repeat the
process until buffers have been read in for all required heights.
If it's successful reading a buffer, and it's at the highest
height we need, it proceeds back to the DEALLOC_MP_FULL state.
If it's unable to fill in a buffer, (encounters a hole, etc.)
it tries to find another non-zero block pointer. If they're all
zero, it lowers the height and returns to the DEALLOC_MP_LOWER
state. If it finds a good non-null pointer, it loops around and
reads it in, while keeping the metapath in lock-step with the
pointers it examines.

The state machine runs until the truncation request is
satisfied. Then any transactions are ended, the quota and
statfs data are updated, and the function is complete.

Helper function metaptr1 was introduced to be an easy way to
determine the start of a buffer_head's indirect pointers.

Helper function lookup_mp_height was introduced to find a
metapath index and read in the buffer that corresponds to it.
In this way, function lookup_metapath becomes a simple loop to
call it for every height.

Helper function fillup_metapath is similar to lookup_metapath
except it can do partial lookups. If the state machine
backed up multiple levels (like 2999 wrapping to 3000) it
needs to find out the next starting point and start issuing
metadata reads at that point.

Helper function hptrs is a shortcut to determine how many
pointers should be expected in a buffer. Height 0 is the dinode
which has fewer pointers than the others.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

d552a2b9

13 7月, 2016 1 次提交

GFS2: Check rs_free with rd_rsspin protection · 44f52122

由 Bob Peterson 提交于 7月 06, 2016

For the last process to close a file opened for write, function
gfs2_rsqa_delete was deleting the file's inode's block reservation
out of the rgrp reservations tree. Then it was checking to make sure
rs_free was 0, but it was performing the check outside the protection
of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed
to prevent a race between the process freeing the reservation and
another who is allocating a new set of blocks inside the same rgrp
for the same inode, thus changing its value.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

44f52122

27 6月, 2016 1 次提交

gfs2: Lock holder cleanup · 6df9f9a2

由 Andreas Gruenbacher 提交于 6月 17, 2016

Make the code more readable by cleaning up the different ways of
initializing lock holders and checking for initialized lock holders:
mark lock holders as uninitialized by setting the holder's glock to NULL
(gfs2_holder_mark_uninitialized) instead of zeroing out the entire
object or using a separate flag.  Recognize initialized holders by their
non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
which are immeditiately initialized via gfs2_holder_init or
gfs2_glock_nq_init.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

6df9f9a2

10 6月, 2016 1 次提交

GFS2: don't set rgrp gl_object until it's inserted into rgrp tree · 36e4ad03

由 Bob Peterson 提交于 6月 09, 2016

Before this patch, function read_rindex_entry would set a rgrp
glock's gl_object pointer to itself before inserting the rgrp into
the rgrp rbtree. The problem is: if another process was also reading
the rgrp in, and had already inserted its newly created rgrp, then
the second call to read_rindex_entry would overwrite that value,
then return a bad return code to the caller. Later, other functions
would reference the now-freed rgrp memory by way of gl_object.
In some cases, that could result in gfs2_rgrp_brelse being called
twice for the same rgrp: once for the failed attempt and once for
the "real" rgrp release. Eventually the kernel would panic.
There are also a number of other things that could go wrong when
a kernel module is accessing freed storage. For example, this could
result in rgrp corruption because the fake rgrp would point to a
fake bitmap in memory too, causing gfs2_inplace_reserve to search
some random memory for free blocks, and find some, since we were
never setting rgd->rd_bits to NULL before freeing it.

This patch fixes the problem by not setting gl_object until we
have successfully inserted the rgrp into the rbtree. Also, it sets
rd_bits to NULL as it frees them, which will ensure any accidental
access to the wrong rgrp will result in a kernel panic rather than
file system corruption, which is preferred.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

36e4ad03

02 5月, 2016 1 次提交

GFS2: Remove allocation parms from gfs2_rbm_find · 8381e602

由 Bob Peterson 提交于 5月 02, 2016

Struct gfs2_alloc_parms ap is never referenced in function
gfs2_rbm_find, so this patch removes it.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

8381e602

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

19 12月, 2015 1 次提交

GFS2: Always use iopen glock for gl_deletes · 5ea31bc0

由 Bob Peterson 提交于 12月 04, 2015

Before this patch, when function try_rgrp_unlink queued a glock for
delete_work to reclaim the space, it used the inode glock to do so.
That's different from the iopen callback which uses the iopen glock
for the same purpose. We should be consistent and always use the
iopen glock. This may also save us reference counting problems with
the inode glock, since clear_glock does an extra glock_put() for the
inode glock.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

5ea31bc0

15 12月, 2015 1 次提交

GFS2: Make rgrp reservations part of the gfs2_inode structure · a097dc7e

由 Bob Peterson 提交于 7月 16, 2015

Before this patch, multi-block reservation structures were allocated
from a special slab. This patch folds the structure into the gfs2_inode
structure. The disadvantage is that the gfs2_inode needs more memory,
even when a file is opened read-only. The advantages are: (a) we don't
need the special slab and the extra time it takes to allocate and
deallocate from it. (b) we no longer need to worry that the structure
exists for things like quota management. (c) This also allows us to
remove the calls to get_write_access and put_write_access since we
know the structure will exist.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

a097dc7e

24 11月, 2015 1 次提交

GFS2: Extract quota data from reservations structure (revert ) · b54e9a0b

由 Bob Peterson 提交于 10月 26, 2015

This patch basically reverts the majority of patch 5407e242.
That patch eliminated the gfs2_qadata structure in favor of just
using the reservations structure. The problem with doing that is that
it increases the size of the reservations structure. That is not an
issue until it comes time to fold the reservations structure into the
inode in memory so we know it's always there. By separating out the
quota structure again, we aren't punishing the non-quota users by
making all the inodes bigger, requiring more slab space. This patch
creates a new slab area to allocate the quota stuff so it's managed
a little more sanely.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b54e9a0b

17 11月, 2015 1 次提交

gfs2: Extended attribute readahead · c8d57703

由 Andreas Gruenbacher 提交于 11月 11, 2015

When gfs2 allocates an inode and its extended attribute block next to
each other at inode create time, the inode's directory entry indicates
that in de_rahead.  In that case, we can readahead the extended
attribute block when we read in the inode.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

c8d57703

09 11月, 2015 1 次提交

GFS2: Fix rgrp end rounding problem for bsize < page size · 31dddd9e

由 Bob Peterson 提交于 10月 28, 2015

This patch fixes a bug introduced by commit 7005c3e4. That patch
tries to map a vm range for resource groups, but the calculation
breaks down when the block size is less than the page size.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

31dddd9e

30 10月, 2015 1 次提交

gfs2: Remove gl_spin define · f3dd1649

由 Andreas Gruenbacher 提交于 10月 29, 2015

Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a
gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in
gl_lockref).  Remove that define to make the references to gl_lockref.lock more
obvious.
Signed-off-by: NAndreas Gruenbacher <andreas.gruenbacher@gmail.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

f3dd1649

04 9月, 2015 2 次提交

gfs2: Make statistics unsigned, suitable for use with do_div() · 4d207133

由 Ben Hutchings 提交于 8月 27, 2015

None of these statistics can meaningfully be negative, and the
numerator for do_div() must have the type u64.  The generic
implementation of do_div() used on some 32-bit architectures asserts
that, resulting in a compiler error in gfs2_rgrp_congested().

Fixes: 0166b197 ("GFS2: Average in only non-zero round-trip times ...")
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NAndreas Gruenbacher <agruenba@redhat.com>

4d207133

GFS2: Move glock superblock pointer to field gl_name · 15562c43

由 Bob Peterson 提交于 3月 16, 2015

What uniquely identifies a glock in the glock hash table is not
gl_name, but gl_name and its superblock pointer. This patch makes
the gl_name field correspond to a unique glock identifier. That will
allow us to simplify hashing with a future patch, since the hash
algorithm can then take the gl_name and hash its components in one
operation.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

15562c43

19 6月, 2015 1 次提交

GFS2: Don't brelse rgrp buffer_heads every allocation · 39b0f1e9

由 Bob Peterson 提交于 6月 05, 2015

This patch allows the block allocation code to retain the buffers
for the resource groups so they don't need to be re-read from buffer
cache with every request. This is a performance improvement that's
especially noticeable when resource groups are very large. For
example, with 2GB resource groups and 4K blocks, there can be 33
blocks for every resource group. This patch allows those 33 buffers
to be kept around and not read in and thrown away with every
operation. The buffers are released when the resource group is
either synced or invalidated.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NSteven Whitehouse <swhiteho@redhat.com>
Reviewed-by: NBenjamin Marzinski <bmarzins@redhat.com>

39b0f1e9

19 5月, 2015 1 次提交

gfs2: fix shadow warning in gfs2_rbm_find() · a3e32136

由 Fabian Frederick 提交于 5月 18, 2015

bi was already declared and initialized globally in gfs2_rbm_find()
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

a3e32136

06 5月, 2015 1 次提交

gfs2: handle NULL rgd in set_rgrp_preferences · 959b6717

由 Abhi Das 提交于 5月 05, 2015

The function set_rgrp_preferences() does not handle the (rarely
returned) NULL value from gfs2_rgrpd_get_next() and this patch
fixes that.

The fs image in question is only 150MB in size which allows for
only 1 rgrp to be created. The in-memory rb tree has only 1 node
and when gfs2_rgrpd_get_next() is called on this sole rgrp, it
returns NULL. (Default behavior is to wrap around the rb tree and
return the first node to give the illusion of a circular linked
list. In the case of only 1 rgrp, we can't have
gfs2_rgrpd_get_next() return the same rgrp (first, last, next all
point to the same rgrp)... that would cause unintended consequences
and infinite loops.)
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

959b6717

24 4月, 2015 2 次提交

GFS2: Average in only non-zero round-trip times for congestion stats · 0166b197

由 Bob Peterson 提交于 4月 22, 2015

This patch changes function gfs2_rgrp_congested so that it only factors
in non-zero values into its average round trip time. If the round-trip
time is zero for a particular cpu, that cpu has obviously never dealt
with bouncing the resource group in question, so factoring in a zero
value will only skew the numbers. It also fixes a compile error on
some arches related to division.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

0166b197

GFS2: Use average srttb value in congestion calculations · f4a3ae93

由 Bob Peterson 提交于 11月 19, 2014

This patch changes function gfs2_rgrp_congested so that it uses an
average srttb (smoothed round trip time for blocking rgrp glocks)
rather than the CPU-specific value. If we use the CPU-specific value
it can incorrectly report no contention when there really is contention
due to the glock processing occurring on a different CPU.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

f4a3ae93

19 3月, 2015 1 次提交

gfs2: allow quota_check and inplace_reserve to return available blocks · 25435e5e

由 Abhi Das 提交于 3月 18, 2015

struct gfs2_alloc_parms is passed to gfs2_quota_check() and
gfs2_inplace_reserve() with ap->target containing the number of
blocks being requested for allocation in the current operation.

We add a new field to struct gfs2_alloc_parms called 'allowed'.
gfs2_quota_check() and gfs2_inplace_reserve() return the max
blocks allowed by quota and the max blocks allowed by the chosen
rgrp respectively in 'allowed'.

A new field 'min_target', when non-zero, tells gfs2_quota_check()
and gfs2_inplace_reserve() to not return -EDQUOT/-ENOSPC when
there are atleast 'min_target' blocks allowable/available. The
assumption is that the caller is ok with just 'min_target' blocks
and will likely proceed with allocating them.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

25435e5e

04 11月, 2014 2 次提交

GFS2: If we use up our block reservation, request more next time · 1a855033

由 Bob Peterson 提交于 10月 29, 2014

If we run out of blocks for a given multi-block allocation, we obviously
did not reserve enough. We should reserve more blocks for the next
reservation to reduce fragmentation. This patch increases the size hint
for reservations when they run out.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

1a855033

GFS2: Set of distributed preferences for rgrps · 0e27c18c

由 Bob Peterson 提交于 10月 29, 2014

This patch tries to use the journal numbers to evenly distribute
which node prefers which resource group for block allocations. This
is to help performance.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

0e27c18c

03 10月, 2014 1 次提交

GFS2: Use gfs2_rbm_incr in rgblk_free · d24e0569

由 Bob Peterson 提交于 10月 03, 2014

This patch speeds up GFS2 unlink operations by using function
gfs2_rbm_incr rather than continuously calculating the rbm.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d24e0569

19 9月, 2014 1 次提交

GFS2: fix bad inode i_goal values during block allocation · 00a158be

由 Abhi Das 提交于 9月 18, 2014

This patch checks if i_goal is either zero or if doesn't exist
within any rgrp (i.e gfs2_blk2rgrpd() returns NULL). If so, it
assigns the ip->i_no_addr block as the i_goal.

There are two scenarios where a bad i_goal can result in a
-EBADSLT error.

1. Attempting to allocate to an existing inode:
Control reaches gfs2_inplace_reserve() and ip->i_goal is bad.
We need to fix i_goal here.

2. A new inode is created in a directory whose i_goal is hosed:
In this case, the parent dir's i_goal is copied onto the new
inode. Since the new inode is not yet created, the ip->i_no_addr
field is invalid and so, the fix in gfs2_inplace_reserve() as per
1) won't work in this scenario. We need to catch and fix it sooner
in the parent dir itself (gfs2_create_inode()), before it is
copied to the new inode.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

00a158be

18 7月, 2014 1 次提交

GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes · 27ff6a0f

由 Fabian Frederick 提交于 7月 02, 2014

Cc: cluster-devel@redhat.com
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

27ff6a0f

14 5月, 2014 1 次提交

GFS2: remove transaction glock · 24972557

由 Benjamin Marzinski 提交于 5月 01, 2014

GFS2 has a transaction glock, which must be grabbed for every
transaction, whose purpose is to deal with freezing the filesystem.
Aside from this involving a large amount of locking, it is very easy to
make the current fsfreeze code hang on unfreezing.

This patch rewrites how gfs2 handles freezing the filesystem. The
transaction glock is removed. In it's place is a freeze glock, which is
cached (but not held) in a shared state by every node in the cluster
when the filesystem is mounted. This lock only needs to be grabbed on
freezing, and actions which need to be safe from freezing, like
recovery.

When a node wants to freeze the filesystem, it grabs this glock
exclusively. When the freeze glock state changes on the nodes (either
from shared to unlocked, or shared to exclusive), the filesystem does a
special log flush. gfs2_log_flush() does all the work for flushing out
the and shutting down the incore log, and then it tries to grab the
freeze glock in a shared state again. Since the filesystem is stuck in
gfs2_log_flush, no new transaction can start, and nothing can be written
to disk. Unfreezing the filesytem simply involes dropping the freeze
glock, allowing gfs2_log_flush() to grab and then release the shared
lock, so it is cached for next time.

However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
shared lock on the filesystem root directory inode to check permissions.
If that glock has already been grabbed exclusively, fsfreeze will be
unable to get the shared lock and unfreeze the filesystem.

In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
on the filesystem root directory during the freeze, and hold it until it
unfreezes the filesystem. The functions which need to grab a shared
lock in order to allow the unfreeze ioctl to be issued now use the lock
grabbed by the freeze code instead.

The freeze and unfreeze code take care to make sure that this shared
lock will not be dropped while another process is using it.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

24972557

07 3月, 2014 2 次提交

GFS2: Use pr_<level> more consistently · d77d1b58

由 Joe Perches 提交于 3月 06, 2014

Add pr_fmt, remove embedded "GFS2: " prefixes.
This now consistently emits lower case "gfs2: " for each message.

Other miscellanea around these changes:

o Add missing newlines
o Coalesce formats
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d77d1b58

GFS2: global conversion to pr_foo() · fc554ed3

由 Fabian Frederick 提交于 3月 05, 2014

-All printk(KERN_foo converted to pr_foo().
-Messages updated to fit in 80 columns.
-fs_macros converted as well.
-fs_printk removed.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fc554ed3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功