提交 · 6aad1c3d3eba3db38b3a1200e2b02ff3af501c5a · openanolis / cloud-kernel

05 3月, 2012 1 次提交

GFS2: Eliminate sd_rindex_mutex · 6aad1c3d

由 Bob Peterson 提交于 3月 05, 2012

Over time, we've slowly eliminated the use of sd_rindex_mutex.
Up to this point, it was only used in two places: function
gfs2_ri_total (which totals the file system size by reading
and parsing the rindex file) and function gfs2_rindex_update
which updates the rgrps in memory. Both of these functions have
the rindex glock to protect them, so the rindex is unnecessary.
Since gfs2_grow writes to the rindex via the meta_fs, the mutex
is in the wrong order according to the normal rules. This patch
eliminates the mutex entirely to avoid the problem.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6aad1c3d

01 3月, 2012 1 次提交

GFS2: Unlock rindex mutex on glock error · a08fd280

由 Bob Peterson 提交于 2月 29, 2012

This patch fixes an error path in function gfs2_rindex_update
that leaves the rindex mutex held.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a08fd280

29 2月, 2012 5 次提交

GFS2: Make bd_cmp() static · 08728f2d

由 Steven Whitehouse 提交于 2月 21, 2012

Add missing static to bd_cmp()
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

08728f2d

GFS2: Sort the ordered write list · 4a36d08d

由 Bob Peterson 提交于 2月 14, 2012

This patch sorts the ordered write list for GFS2 writes.
This increases the throughput for simultaneous writes.
For example, if you have ten processes, all doing:
dd if=/dev/zero of=/mnt/gfs2/fileX
on different files, the throughput will be much better.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4a36d08d

GFS2: FITRIM ioctl support · 66fc061b

由 Steven Whitehouse 提交于 2月 08, 2012

The FITRIM ioctl provides an alternative way to send discard requests to
the underlying device. Using the discard mount option results in every
freed block generating a discard request to the block device. This can
be slow, since many block devices can only process discard requests of
larger sizes, and also such operations can be time consuming.

Rather than using the discard mount option, FITRIM allows a sweep of the
filesystem on an occasional basis, and also to optionally avoid sending
down discard requests for smaller regions.

In GFS2 FITRIM will work at resource group granularity. There is a flag
for each resource group which keeps track of which resource groups have
been trimmed. This flag is reset whenever a deallocation occurs in the
resource group, and set whenever a successful FITRIM of that resource
group has taken place. This helps to reduce repeated discard requests
for the same block ranges, again improving performance.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

66fc061b

GFS2: Move two functions from log.c to lops.c · 47ac5537

由 Steven Whitehouse 提交于 2月 03, 2012

gfs2_log_get_buf() and gfs2_log_fake_buf() are both used
only in lops.c, so move them next to their callers and they
can then become static.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

47ac5537

GFS2: glock statistics gathering · a245769f

由 Steven Whitehouse 提交于 1月 20, 2012

The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
super block stats are done on a per cpu basis in order to
try and reduce the overhead of gathering them. They are also
further divided by glock type.

In the case of both the super block and glock statistics,
the same information is gathered in each case. The super
block statistics are used to provide default values for
most of the glock statistics, so that newly created glocks
should have, as far as possible, a sensible starting point.

The statistics are divided into three pairs of mean and
variance, plus two counters. The mean/variance pairs are
smoothed exponential estimates and the algorithm used is
one which will be very familiar to those used to calculation
of round trip times in network code.

The three pairs of mean/variance measure the following
things:

 1. DLM lock time (non-blocking requests)
 2. DLM lock time (blocking requests)
 3. Inter-request time (again to the DLM)

A non-blocking request is one which will complete right
away, whatever the state of the DLM lock in question. That
currently means any requests when (a) the current state of
the lock is exclusive (b) the requested state is either null
or unlocked or (c) the "try lock" flag is set. A blocking
request covers all the other lock requests.

There are two counters. The first is there primarily to show
how many lock requests have been made, and thus how much data
has gone into the mean/variance calculations. The other counter
is counting queueing of holders at the top layer of the glock
code. Hopefully that number will be a lot larger than the number
of dlm lock requests issued.

So why gather these statistics? There are several reasons
we'd like to get a better idea of these timings:

1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
allocation (to base it on lock wait time, rather than blindly
using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
into account after 8 samples (or 4 for the variance) and this
needs to be carefully considered when interpreting the
results.

Knowing both the time it takes a lock request to complete and
the average time between lock requests for a glock means we
can compute the total percentage of the time for which the
node is able to use a glock vs. time that the rest of the
cluster has its share. That will be very useful when setting
the lock min hold time.

The other point to remember is that all times are in
nanoseconds. Great care has been taken to ensure that we
measure exactly the quantities that we want, as accurately
as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a245769f

28 2月, 2012 4 次提交

GFS2: Read resource groups on mount · a365fbf3

由 Steven Whitehouse 提交于 2月 24, 2012

This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a365fbf3

GFS2: Ensure rindex is uptodate for fallocate · 9e73f571

由 Bob Peterson 提交于 2月 17, 2012

This patch fixes a problem whereby gfs2_grow was failing and causing GFS2
to assert. The problem was that when GFS2's fallocate operation tried to
acquire an "allocation" it made sure the rindex was up to date, and if not,
it called gfs2_rindex_update. However, if the file being fallocated was
the rindex itself, it was already locked at that point. By calling
gfs2_rindex_update at an earlier point in time, we bring rindex up to date
and thereby avoid trying to lock it when the "allocation" is acquired.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9e73f571

GFS2: Read in rindex if necessary during unlink · 718b97bd

由 Bob Peterson 提交于 2月 16, 2012

This patch fixes a problem whereby you were unable to delete
files until other file system operations were done (such as
statfs, touch, writes, etc.) that caused the rindex to be
read in.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

718b97bd

GFS2: Fix race between lru_list and glock ref count · 4043b886

由 Steven Whitehouse 提交于 1月 16, 2012

This patch fixes a narrow race window between the glock ref count
hitting zero and glocks being removed from the lru_list.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4043b886

11 1月, 2012 5 次提交

GFS2: Fix nlink setting on inode creation · 66ad863b

由 Steven Whitehouse 提交于 1月 11, 2012

Since the nlink count will be 0, we need to use set_nlink rather
than inc_nlink in order to avoid triggering the inc_nlink warning
which was added recently.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

66ad863b

GFS2: fail mount if journal recovery fails · 376d3778

由 David Teigland 提交于 1月 09, 2012

If the first mounter fails to recover one of the journals
during mount, the mount should fail.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

376d3778

GFS2: let spectator mount do read only recovery · e8ca5cc5

由 David Teigland 提交于 1月 09, 2012

Previously, a spectator mount would not even attempt to do
journal recovery for a failed node.  This meant that if all
mounted nodes were spectators, everyone would be stuck after
a node failed, all waiting for recovery to be performed.
This is unnecessary since the failed node had a clean journal.

Instead, allow a spectator mount to do a partial "read only"
recovery, which means it will check if the failed journal is
clean, and if so, report a successful recovery.  If the failed
journal is not clean, it reports that journal recovery failed.
This makes it work the same as a read only mount on a read only
block device.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e8ca5cc5

GFS2: Fix a use-after-free that coverity spotted · 49528b4e

由 Bob Peterson 提交于 1月 06, 2012

In function gfs2_inplace_release it was trying to unlock a gfs2_holder
structure associated with a reservation, after said reservation was
freed. The problem is that the statements have the wrong order.
This patch corrects the order so that the reservation is freed after
the gfs2_holder is unlocked.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

49528b4e

GFS2: dlm based recovery coordination · e0c2a9aa

由 David Teigland 提交于 1月 09, 2012

This new method of managing recovery is an alternative to
the previous approach of using the userland gfs_controld.

- use dlm slot numbers to assign journal id's
- use dlm recovery callbacks to initiate journal recovery
- use a dlm lock to determine the first node to mount fs
- use a dlm lock to track journals that need recovery
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e0c2a9aa

07 1月, 2012 1 次提交
- A
  vfs: switch ->show_options() to struct dentry * · 34c80b1d
  由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  34c80b1d
04 1月, 2012 8 次提交

dlm: add recovery callbacks · 60f98d18

由 David Teigland 提交于 11月 02, 2011

These new callbacks notify the dlm user about lock recovery.
GFS2, and possibly others, need to be aware of when the dlm
will be doing lock recovery for a failed lockspace member.

In the past, this coordination has been done between dlm and
file system daemons in userspace, which then direct their
kernel counterparts.  These callbacks allow the same
coordination directly, and more simply.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

60f98d18

A
fs: propagate umode_t, misc bits · 175a4eb7
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
175a4eb7
A
switch ->mknod() to umode_t · 1a67aafb
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1a67aafb

switch ->create() to umode_t · 4acdaf27

由 Al Viro 提交于 7月 26, 2011

vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4acdaf27

switch vfs_mkdir() and ->mkdir() to umode_t · 18bb1db3

由 Al Viro 提交于 7月 26, 2011

vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

18bb1db3

vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05

由 Al Viro 提交于 12月 12, 2011

Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b520e05

vfs: mnt_drop_write_file() · 2a79f17e

由 Al Viro 提交于 12月 09, 2011

new helper (wrapper around mnt_drop_write()) to be used in pair with
mnt_want_write_file().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a79f17e

switch a bunch of places to mnt_want_write_file() · a561be71

由 Al Viro 提交于 11月 23, 2011

it's both faster (in case when file has been opened for write) and cleaner.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a561be71

06 12月, 2011 1 次提交

GFS2: local functions should be static · 46cc1e5f

由 H Hartley Sweeten 提交于 9月 23, 2011

Quiets the sparse noise:

warning: symbol 'gfs2_initxattrs' was not declared. Should it be static?
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

46cc1e5f

23 11月, 2011 1 次提交

GFS2: We only need one ACL getting function · 018a01cd

由 Steven Whitehouse 提交于 11月 23, 2011

There is no need to have two versions of this function with
slightly different arguments.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

018a01cd

22 11月, 2011 4 次提交

GFS2: Fix multi-block allocation · 6a8099ed

由 Steven Whitehouse 提交于 11月 22, 2011

Clean up gfs2_alloc_blocks so that it takes the full extent length
rather than just the number of non-inode blocks as an argument. That
will only make a difference in the inode allocation case for now.

Also, this fixes the extent length handling around gfs2_alloc_extent() so
that multi block allocations will work again.

The rd_last_alloc block is set to the final block in the allocated
extent (as per the update to i_goal, but referenced to a different
start point).

This also removes the dinode argument to rgblk_search() which is no
longer used.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6a8099ed

GFS2: decouple quota allocations from block allocations · 564e12b1

由 Bob Peterson 提交于 11月 21, 2011

This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

564e12b1

freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e

由 Tejun Heo 提交于 11月 21, 2011

There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>

a0acae0e

GFS2: split function rgblk_search · b3e47ca0

由 Bob Peterson 提交于 11月 21, 2011

This patch splits function rgblk_search into a function that finds
blocks to allocate (rgblk_search) and a function that assigns those
blocks (gfs2_alloc_extent).
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@rehat.com>

b3e47ca0

21 11月, 2011 3 次提交

GFS2: Fix up "off by one" in the previous patch · 465f0a76

由 Steven Whitehouse 提交于 11月 21, 2011

The trace point should take extlen and not *ndata as the
extent length.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

465f0a76

GFS2: move toward a generic multi-block allocator · 6e87ed0f

由 Bob Peterson 提交于 11月 18, 2011

This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6e87ed0f

GFS2: O_(D)SYNC support for fallocate · 4442f2e0

由 Steven Whitehouse 提交于 11月 21, 2011

Add sync of metadata after fallocate for O_SYNC files to ensure that we
meet expectations for everything being on disk in this case.
Unfortunately, the offset and len parameters are modified during the
course of the fallocate function, so I've had to add a couple of new
variables to call generic_write_sync() at the end.

I know that potentially this will sync data as well within the range,
but I think that is a fairly harmless side-effect overall, since we
would not normally expect there to be any dirty data within the range in
question.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Benjamin Marzinski <bmarzins@redhat.com>

4442f2e0

18 11月, 2011 1 次提交

GFS2: remove vestigial al_alloced · b9f417f3

由 Bob Peterson 提交于 11月 16, 2011

This patch removes the vestigial variable al_alloced from
the gfs2_alloc structure. This is another baby step toward
multi-block reservations.

My next planned step is to decouple the quota variables
from the gfs2_alloc structure so we can use a different
method for allocations.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b9f417f3

15 11月, 2011 2 次提交

GFS2: combine gfs2_alloc_block and gfs2_alloc_di · 3c5d785a

由 Bob Peterson 提交于 11月 14, 2011

GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
the same things, with a few exceptions. This patch combines
the two functions into a slightly more generic gfs2_alloc_block.
Having one centralized block allocation function will reduce
code redundancy and make it easier to implement multi-block
reservations to reduce file fragmentation in the future.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3c5d785a

GFS2: Add non-try locks back to get_local_rgrp · c688b8b3

由 Bob Peterson 提交于 11月 14, 2011

This upstream patch had what I believe is an unintended consequence:

http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-3.0-nmw.git;a=commitdiff;h=beca42486749c1538a5ed58fe9dcc9f26d428c93

The patch changed function get_local_rgrp such that it ONLY
used TRY locks for RGRP searches. Prior to that patch, the code
used TRY locks during the first loop, and if that was unsuccessful,
it used normal blocking locks on subsequent searches. This patch
changes it back to the old way.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c688b8b3

09 11月, 2011 2 次提交

GFS2: f_ra is always valid in dir readahead function · 79c4c379

由 Steven Whitehouse 提交于 11月 09, 2011

As a result, we don't need to test it each time.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>

79c4c379

GFS2: Fix very unlikley memory leak in ACL xattr code · 114b80ce

由 Steven Whitehouse 提交于 11月 09, 2011

This was spotted by automated code analysis. In case reading
an ACL xattr failed (only likely to happen if there is an I/O
error for example, and even then only with unstuffed xattrs,
so pretty difficult to trigger) a small amount of memory could
potentially be leaked.

This patch adds a kfree to the error path, and also removes a
test which is no longer required (gfs2_ea_get_copy always
returns either a negative error, or a length)
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

114b80ce

08 11月, 2011 1 次提交

GFS2: More automated code analysis fixes · 87654896

由 Steven Whitehouse 提交于 11月 08, 2011

A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

87654896

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功