提交 · ab9bbda0204dfd0e5342562d9979d1241b14ea5f · openeuler / raspberrypi-kernel

21 10月, 2011 7 次提交

由 Steven Whitehouse 提交于 8月 15, 2011

The aim of this patch is to use the newly enhanced ->dirty_inode()
super block operation to deal with atime updates, rather than
piggy backing that code into ->write_inode() as is currently
done.

The net result is a simplification of the code in various places
and a reduction of the number of gfs2_dinode_out() calls since
this is now implied by ->dirty_inode().

Some of the mark_inode_dirty() calls have been moved under glocks
in order to take advantage of then being able to avoid locking in
->dirty_inode() when we already have suitable locks.

One consequence is that generic_write_end() now correctly deals
with file size updates, so that we do not need a separate check
for that afterwards. This also, indirectly, means that fdatasync
should work correctly on GFS2 - the current code always syncs the
metadata whether it needs to or not.

Has survived testing with postmark (with and without atime) and
also fsx.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ab9bbda0

GFS2: Fix bug trap and journaled data fsync · f1818529

由 Steven Whitehouse 提交于 8月 05, 2011

Journaled data requires that a complete flush of all dirty data for
the file is done, in order that the ail flush which comes after
will succeed.

Also the recently enhanced bug trap can trigger falsely in case
an ail flush from fsync races with a page read. This updates the
bug trap such that it will ignore buffers which are locked and
only trigger on dirty and/or pinned buffers when the ail flush
is run from fsync. The original bug trap is retained when ail
flush is run from ->go_sync()
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f1818529

GFS2: Fix inode allocation error path · 40ac218f

由 Steven Whitehouse 提交于 8月 02, 2011

If we have got far enough through the inode allocation code
path that an inode has already been allocated, then we must
call iput to dispose of it, if an error occurs during a
later part of the process. This will always be the final iput
since there will be no other references to the inode.

Unlike when the inode has been unlinked, its block state will
be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
to skip the test in ->evict_inode() for this one case in order
to ensure that it will be deallocated correctly. This patch adds
a new flag in order to ensure that this will happen correctly.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

40ac218f

GFS2: Make atime checks more efficient · 1d4ec642

由 Steven Whitehouse 提交于 8月 02, 2011

We do not need to start a transaction unless the atime
check has proved positive. Also if we are going to flush
the complete ail list anyway, we might as well skip the
writeback for this specific inode's metadata, since that
will be done as part of the ail writeback process in an
order offering potentially more efficient I/O.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

1d4ec642

GFS2: Fix bug-trap in ail flush code · 75549186

由 Steven Whitehouse 提交于 8月 02, 2011

The assert was being tested under the wrong lock, a
legacy of the original code. Also, if it does trigger,
the resulting information was not always a lot of help.

This moves the patch under the correct lock and also
prints out more useful information in tacking down the
source of the problem.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

75549186

GFS2: Split data write & wait in fsync · 2f0264d5

由 Steven Whitehouse 提交于 7月 27, 2011

Now that the data writing is part of fsync proper, we can split
the waiting part out and do it later on. This reduces the
number of waits that we do during fsync on average.

There is also no need to take the i_mutex unless we are flushing
metadata to disk, so we can move that to within the metadata
flushing code.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

2f0264d5

GFS2: Clean up dir hash table reading · 4c28d338

由 Steven Whitehouse 提交于 7月 26, 2011

Since there is now only a single caller to gfs2_dir_read_data()
and it has a number of constant arguments, we can factor
those out. Also some tests relating to the inode size were
being done twice.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4c28d338

23 8月, 2011 2 次提交

block: separate priority boosting from REQ_META · 65299a3b

由 Christoph Hellwig 提交于 8月 23, 2011

Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.

All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

65299a3b

block: remove READ_META and WRITE_META · 5dc06c5a

由 Christoph Hellwig 提交于 8月 23, 2011

Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5dc06c5a

01 8月, 2011 2 次提交

switch posix_acl_equiv_mode() to umode_t * · d6952123

由 Al Viro 提交于 7月 23, 2011

... so that &inode->i_mode could be passed to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d6952123

A
switch posix_acl_create() to umode_t * · d3fb6120
由 Al Viro 提交于 7月 23, 2011
```
so we can pass &inode->i_mode to it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d3fb6120

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

26 7月, 2011 5 次提交

GFS2: Fix mount hang caused by certain access pattern to sysfs files · 19237039

由 Steven Whitehouse 提交于 7月 26, 2011

Depending upon the order of userspace/kernel during the
mount process, this can result in a hang without the
_all version of the completion.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

19237039

fs: take the ACL checks to common code · 4e34e719

由 Christoph Hellwig 提交于 7月 23, 2011

Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss. This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4e34e719

kill boilerplates around posix_acl_create_masq() · 826cae2f

由 Al Viro 提交于 7月 23, 2011

new helper: posix_acl_create(&acl, gfp, mode_p).  Replaces acl with
modified clone, on failure releases acl and replaces with NULL.
Returns 0 or -ve on error.  All callers of posix_acl_create_masq()
switched.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

826cae2f

kill boilerplate around posix_acl_chmod_masq() · bc26ab5f

由 Al Viro 提交于 7月 23, 2011

new helper: posix_acl_chmod(&acl, gfp, mode).  Replaces acl with modified
clone or with NULL if that has failed; returns 0 or -ve on error.  All
callers of posix_acl_chmod_masq() switched to that - they'd been doing
exactly the same thing.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bc26ab5f

vfs: move ACL cache lookup into generic code · e77819e5

由 Linus Torvalds 提交于 7月 22, 2011

This moves logic for checking the cached ACL values from low-level
filesystems into generic code.  The end result is a streamlined ACL
check that doesn't need to load the inode->i_op->check_acl pointer at
all for the common cached case.

The filesystems also don't need to check for a non-blocking RCU walk
case in their acl_check() functions, because that is all handled at a
VFS layer.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e77819e5

21 7月, 2011 3 次提交

simplify gfs2_lookup() · 6c673ab3

由 Al Viro 提交于 7月 17, 2011

d_splice_alias() will DTRT when given NULL or ERR_PTR
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6c673ab3

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82

由 Josef Bacik 提交于 7月 16, 2011

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

02c24a82

fs: move inode_dio_wait calls into ->setattr · 562c72aa

由 Christoph Hellwig 提交于 6月 24, 2011

Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

562c72aa

20 7月, 2011 5 次提交
- A
  ->permission() sanitizing: don't pass flags to ->permission() · 10556cb2
  由 Al Viro 提交于 6月 20, 2011
```
not used by the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  10556cb2
- A
  ->permission() sanitizing: don't pass flags to generic_permission() · 2830ba7f
  由 Al Viro 提交于 6月 20, 2011
```
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  2830ba7f
- A
  ->permission() sanitizing: don't pass flags to ->check_acl() · 7e40145e
  由 Al Viro 提交于 6月 20, 2011
```
not used in the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  7e40145e
- A
  ->permission() sanitizing: pass MAY_NOT_BLOCK to ->check_acl() · 9c2c7039
  由 Al Viro 提交于 6月 20, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  9c2c7039
- A
  kill check_acl callback of generic_permission() · 178ea735
  由 Al Viro 提交于 6月 20, 2011
```
its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  178ea735
15 7月, 2011 4 次提交

GFS2: combine duplicated block freeing routines · 46fcb2ed

由 Eric Sandeen 提交于 6月 23, 2011

__gfs2_free_data and __gfs2_free_meta are almost identical, and
can be trivially combined.

[This is as per Eric's original patch minus gfs2_free_data() which had
 no callers left and plus the conversion of the bmap.c calls to these
 functions. All in all, a nice clean up]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

46fcb2ed

GFS2: Add S_NOSEC support · 9964afbb

由 Steven Whitehouse 提交于 6月 16, 2011

This adds S_NOSEC support to GFS2. We set/reset the flag either when
a user calls setattr or when we have just regained the glock
from another node. The flag is only set if there are no xattrs
on the inode and there is no suid bit set.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>

9964afbb

GFS2: Automatically adjust glock min hold time · 7cf8dcd3

由 Bob Peterson 提交于 6月 15, 2011

This patch is a performance improvement for GFS2 in a clustered
environment. It makes the glock hold time self-adjusting.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7cf8dcd3

GFS2: Cache dir hash table in a contiguous buffer · 17d539f0

由 Steven Whitehouse 提交于 6月 15, 2011

This patch adds a cache for the hash table to the directory code
in order to help simplify the way in which the hash table is
accessed. This is intended to be a first step towards introducing
some performance improvements in the directory code.

There are two follow ups that I'm hoping to see fairly shortly. One
is to simplify the hash table reading code now that we always read the
complete hash table, whether we want one entry or all of them. The
other is to introduce readahead on the heads of the hash chains
which are referred to from the table.

The hash table is a maximum of 128k in size, so it is not worth trying
to read it in small chunks.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

17d539f0

14 7月, 2011 1 次提交

GFS2: Resolve inode eviction and ail list interaction bug · 380f7c65

由 Steven Whitehouse 提交于 7月 14, 2011

This patch contains a few misc fixes which resolve a recently
reported issue. This patch has been a real team effort and has
received a lot of testing.

The first issue is that the ail lock needs to be held over a few
more operations. The lock thats added into gfs2_releasepage() may
possibly be a candidate for replacing with RCU at some future
point, but at this stage we've gone for the obvious fix.

The second issue is that gfs2_write_inode() can end up calling
a glock recursively when called from gfs2_evict_inode() via the
syncing code, so it needs a guard added.

The third issue is that we either need to not truncate the metadata
pages of inodes which have zero link count, but which we cannot
deallocate due to them still being in use by other nodes, or we need
to ensure that those pages have all made it through the journal and
ail lists first. This patch takes the former approach, but the
latter has also been tested and there is nothing to choose between
them performance-wise. So again, we could revise that decision
in the future.

Also, the inode eviction process is now better documented.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Tested-by: NBob Peterson <rpeterso@redhat.com>
Tested-by: NAbhijith Das <adas@redhat.com>
Reported-by: NBarry J. Marson <bmarson@redhat.com>
Reported-by: NDavid Teigland <teigland@redhat.com>

380f7c65

12 7月, 2011 2 次提交

GFS2: Fix race during filesystem mount · 3942ae53

由 Steven Whitehouse 提交于 7月 11, 2011

There is a potential race during filesystem mounting which has recently
been reported. It occurs when the userland gfs_controld is able to
process requests fast enough that it tries to use the sysfs interface
before the lock module is properly initialised. This is a pretty
unusual case as normally the lock module initialisation is very quick
compared with gfs_controld.

This patch adds an interruptible completion which is used to ensure that
userland will wait for the initialisation of the lock module to
complete.

There are other potential solutions to this problem, but this is the
quickest at this stage and has been tested both with and without
mount.gfs2 present in the system.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Reported-by: NDavid Booher <dbooher@adams.net>

3942ae53

GFS2: force a log flush when invalidating the rindex glock · 1ce53368

由 Benjamin Marzinski 提交于 6月 13, 2011

Right now, there is nothing that forces the log to get flushed when a node
drops its rindex glock so that another node can grow the filesystem. If the
log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the
following way.

A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is
dropped so the other node can grow the filesystem. When the node reacquires the
rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being
removed from the list by gfs2_log_flush().

This code simply forces a log flush when the rindex glock is invalidated,
solving the problem.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

1ce53368

26 5月, 2011 1 次提交

gfs2: Drop __TIME__ usage · 8d2c50e3

由 Michal Marek 提交于 4月 01, 2011

The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: NMichal Marek <mmarek@suse.cz>

8d2c50e3

25 5月, 2011 2 次提交

vmscan: change shrinker API by passing shrink_control struct · 1495f230

由 Ying Han 提交于 5月 24, 2011

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: NYing Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1495f230

GFS2: Processes waiting on inode glock that no processes are holding · f90e5b5b

由 Bob Peterson 提交于 5月 24, 2011

This patch fixes a race in the GFS2 glock state machine that may
result in lockups.  The symptom is that all nodes but one will
hang, waiting for a particular glock.  All the holder records
will have the "W" (Waiting) bit set.  The other node will
typically have the glock stuck in Exclusive mode (EX) with no
holder records, but the dinode will be cached.  In other words,
an entry with "I:" will appear in the glock dump for that glock,
but nothing else.

The race has to do with the glock "Pending Demote" bit, which
can be set, then immediately reset, thus losing the fact that
another node needs the glock.  The sequence of events is:

1. Something schedules the glock workqueue (e.g. glock request from fs)
2. The glock workqueue gets to the point between the test of the reply pending
bit and the spin lock:

        if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
                finish_xmote(gl, gl->gl_reply);
                drop_ref = 1;
        }
        down_read(&gfs2_umount_flush_sem);         <---- i.e. here
        spin_lock(&gl->gl_spin);

3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
            (b) the demote request which sets GLF_PENDING_DEMOTE

4. The following test is executed:

        if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
            gl->gl_state != LM_ST_UNLOCKED &&
            gl->gl_demote_state != LM_ST_EXCLUSIVE) {

This resets the pending demote flag, and gl->gl_demote_state is not equal to
exclusive, however because the reply from the dlm arrived after we checked for
the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

The patch closes the timing window by only transitioning the
"Pending demote" bit to the "demote" flag once we know the
other conditions (not unlocked and not exclusive) are met.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f90e5b5b

22 5月, 2011 1 次提交

GFS2: Wait properly when flushing the ail list · 26b06a69

由 Steven Whitehouse 提交于 5月 21, 2011

The ail flush code has always relied upon log flushing to prevent
it from spinning needlessly. This fixes it to wait on the last
I/O request submitted (we don't need to wait for all of it)
instead of either spinning with io_schedule or sleeping.

As a result cpu usage of gfs2_logd is much reduced with certain
workloads.
Reported-by: NAbhijith Das <adas@redhat.com>
Tested-by: NAbhijith Das <adas@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

26b06a69

21 5月, 2011 1 次提交

GFS2: Wipe directory hash table metadata when deallocating a directory · 6d3117b4

由 Steven Whitehouse 提交于 5月 21, 2011

The deallocation code for directories in GFS2 is largely divided into
two parts. The first part deallocates any directory leaf blocks and
marks the directory as being a regular file when that is complete. The
second stage was identical to deallocating regular files.

Regular files have their data blocks in a different
address space to directories, and thus what would have been normal data
blocks in a regular file (the hash table in a GFS2 directory) were
deallocated correctly. However, a reference to these blocks was left in the
journal (assuming of course that some previous activity had resulted in
those blocks being in the journal or ail list).

This patch uses the i_depth as a test of whether the inode is an
exhash directory (we cannot test the inode type as that has already
been changed to a regular file at this stage in deallocation)

The original issue was reported by Chris Hertel as an issue he encountered
running bonnie++
Reported-by: NChristopher R. Hertel <crh@samba.org>
Cc: Abhijith Das <adas@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6d3117b4

13 5月, 2011 3 次提交

GFS2: Move all locking inside the inode creation function · f2741d98

由 Steven Whitehouse 提交于 5月 13, 2011

Now that there are no longer any exceptions to the normal inode
creation code path, we can move the parts of the locking code
which were duplicated in mkdir/mknod/create/symlink into the
inode create function.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f2741d98

GFS2: Clean up symlink creation · 160b4026

由 Steven Whitehouse 提交于 5月 13, 2011

This moves the symlink specific parts of inode creation
into the function where we initialise the rest of the
dinode. As a result we have one less place where we need
to look up the inode's buffer.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

160b4026

GFS2: Clean up mkdir · e2d0a13b

由 Steven Whitehouse 提交于 5月 13, 2011

This moves the initialisation of the directory into the inode
creation functions to avoid having to duplicate the lookup
of the inode's buffer.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e2d0a13b