提交 · 7b5e3d5fcf0d6fce66050bd0313a7dc2ae4abc62 · openanolis / cloud-kernel

20 9月, 2010 7 次提交

GFS2: Don't enforce min hold time when two demotes occur in rapid succession · 7b5e3d5f

由 Steven Whitehouse 提交于 9月 03, 2010

Due to the design of the VFS, it is quite usual for operations on GFS2
to consist of a lookup (requiring a shared lock) followed by an
operation requiring an exclusive lock. If a remote node has cached an
exclusive lock, then it will receive two demote events in rapid succession
firstly for a shared lock and then to unlocked. The existing min hold time
code was triggering in this case, even if the node was otherwise idle
since the state change time was being updated by the initial demote.

This patch introduces logic to skip the min hold timer in the case that
a "double demote" of this kind has occurred. The min hold timer will
still be used in all other cases.

A new glock flag is introduced which is used to keep track of whether
there have been any newly queued holders since the last glock state
change. The min hold time is only applied if the flag is set.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Tested-by: NAbhijith Das <adas@redhat.com>

7b5e3d5f

S
GFS2: Fix whitespace in previous patch · fe08d5a8
由 Steven Whitehouse 提交于 8月 23, 2010
```
Removes the offending space
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
```
fe08d5a8

GFS2: fallocate support · 3921120e

由 Benjamin Marzinski 提交于 8月 20, 2010

This patch adds support for fallocate to gfs2. Since the gfs2 does not support
uninitialized data blocks, it must write out zeros to all the blocks. However,
since it does not need to lock any pages to read from, gfs2 can write out the
zero blocks much more efficiently. On a moderately full filesystem, fallocate
works around 5 times faster on average. The fallocate call also allows gfs2 to
add blocks to the file without changing the filesize, which will make it
possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
grow a completely full filesystem.
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3921120e

GFS2: Add a bug trap in allocation code · 9a3f236d

由 Steven Whitehouse 提交于 8月 23, 2010

This adds a check to ensure that if we reach the block allocator
that we don't try and proceed if there is no alloc structure
hanging off the inode. This should only happen if there is a bug
in GFS2. The error return code is distinctive in order that it
will be easily spotted.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9a3f236d

GFS2: No longer experimental · 820969f3

由 Steven Whitehouse 提交于 8月 11, 2010

I think the time has arrvied to remove the experimental tag
from GFS2.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

820969f3

GFS2: Remove i_disksize · a2e0f799

由 Steven Whitehouse 提交于 8月 11, 2010

With the update of the truncate code, ip->i_disksize and
inode->i_size are merely copies of each other. This means
we can remove ip->i_disksize and use inode->i_size exclusively
reducing the size of a GFS2 inode by 8 bytes.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a2e0f799

GFS2: New truncate sequence · ff8f33c8

由 Steven Whitehouse 提交于 8月 11, 2010

This updates GFS2's truncate code to use the new truncate
sequence correctly. This is a stepping stone to being
able to remove ip->i_disksize in favour of using i_size
everywhere now that the two sizes are always identical.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>

ff8f33c8

17 9月, 2010 1 次提交

GFS2: gfs2_logd should be using interruptible waits · 5f487490

由 Steven Whitehouse 提交于 9月 09, 2010

Looks like this crept in, in a recent update.
Reported-by: NKrzysztof Urbaniak <urban@bash.org.pl>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

5f487490

10 8月, 2010 6 次提交

A
Make ->drop_inode() just return whether inode needs to be dropped · 45321ac5
由 Al Viro 提交于 6月 07, 2010
```
... and let iput_final() do the actual eviction or retention
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
45321ac5
A
switch gfs2 to ->evict_inode() · d5c1515c
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d5c1515c

simplify checks for I_CLEAR/I_FREEING · a4ffdde6

由 Al Viro 提交于 6月 02, 2010

add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information.  As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a4ffdde6

check ATTR_SIZE contraints in inode_change_ok · 2c27c65e

由 Christoph Hellwig 提交于 6月 04, 2010

Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok.  Also clean up and document inode_change_ok
to make this obvious.

As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error.  This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.

Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.

Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c27c65e

remove inode_setattr · 1025774c

由 Christoph Hellwig 提交于 6月 04, 2010

Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1025774c

sort out blockdev_direct_IO variants · eafdc7d1

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eafdc7d1

08 8月, 2010 2 次提交

block: unify flags for struct bio and struct request · 7b6d91da

由 Christoph Hellwig 提交于 8月 07, 2010

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7b6d91da

block: BARRIER request should imply SYNC · 41f2df62

由 Christoph Hellwig 提交于 6月 17, 2010

A barrier request should by defintion have priority in get_request
and let the queue be unplugged immediately as it's blocking all forward
progress due to the queue draining.

Most filesystems already get this implicitly by the way how submit_bh
treats the buffer_ordered flag, and gfs2 sets it explicitly.  But btrfs
and XFS are still forgetting to set the flag, as is blkdev_issue_flush
and some places in DM/MD.

For XFS on metadata heavy workloads this gives a consistent speedup
in the 2-3% range.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

41f2df62

02 8月, 2010 1 次提交

GFS2: Fix recovery stuck bug (try #2) · 0809f6ec

由 Steven Whitehouse 提交于 8月 02, 2010

This is a clean up of the code which deals with LM_FLAG_NOEXP
which aims to remove any possible race conditions by using
gl_spin to cover the gap between testing for the LM_FLAG_NOEXP
and the GL_FROZEN flag.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

0809f6ec

30 7月, 2010 1 次提交

GFS2: Fix typo in stuffed file data copy handling · c639d5d8

由 Abhijith Das 提交于 7月 30, 2010

trunc_start() in bmap.c incorrectly uses sizeof(struct gfs2_inode) instead of
sizeof(struct gfs2_dinode).
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c639d5d8

29 7月, 2010 7 次提交

Revert "GFS2: recovery stuck on transaction lock" · 7cdee5db

由 Steven Whitehouse 提交于 7月 29, 2010

This reverts commit b7dc2df5.

The initial patch didn't quite work since it doesn't cover all
the possible routes by which the GLF_FROZEN flag might be set.
A revised fix is coming up in the next patch.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7cdee5db

GFS2: Make "try" lock not try quite so hard · d5341a92

由 Steven Whitehouse 提交于 7月 23, 2010

This looks like a big change, but in reality its only a single line of actual
code change, the rest is just moving a function to before its new caller.
The "try" flag for glocks is a rather subtle and delicate setting since it
requires that the state machine tries just hard enough to ensure that it has
a good chance of getting the requested lock, but no so hard that the
request can land up blocked behind another.

The patch adds in an additional check which will fail any queued try
locks if there is another request blocking the try lock request which
is not granted and compatible, nor in progress already. The check is made
only after all pending locks which may be granted have been granted.

I've checked this with the reproducer for the reported flock bug which
this is intended to fix, and it now passes.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d5341a92

GFS2: remove dependency on __GFP_NOFAIL · 4244b52e

由 David Rientjes 提交于 7月 20, 2010

The k[mc]allocs in dr_split_leaf() and dir_double_exhash() are failable,
so remove __GFP_NOFAIL from their masks.

Cc: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4244b52e

GFS2: Simplify gfs2_write_alloc_required · 461cb419

由 Bob Peterson 提交于 6月 24, 2010

Function gfs2_write_alloc_required always returned zero as its
return code.  Therefore, it doesn't need to return a return code
at all.  Given that, we can use the return value to return whether
or not the dinode needs block allocations rather than passing
that value in, which in turn simplifies a bunch of error checking.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

461cb419

GFS2: Wait for journal id on mount if not specified on mount command line · ba6e9364

由 Steven Whitehouse 提交于 6月 14, 2010

This patch implements a wait for the journal id in the case that it has
not been specified on the command line. This is to allow the future
removal of the mount.gfs2 helper. The journal id would instead be
directly communicated by gfs_controld to the file system. Here is a
comparison of the two systems:

Current:
1. mount calls mount.gfs2
2. mount.gfs2 connects to gfs_controld to retrieve the journal id
3. mount.gfs2 adds the journal id to the mount command line and calls
the mount system call
4. gfs_controld receives the status of the mount request via a uevent

Proposed:
1. mount calls the mount system call (no mount.gfs2 helper)
2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know
about already
3. gfs_controld assigns a journal id to it via sysfs
4. the mount system call then completes as normal (sending a uevent
according to status)

The advantage of the proposed system is that it is completely backward
compatible with the current system both at the kernel and at the
userland levels. The "first" parameter can also be set the same way,
with the restriction that it must be set before the journal id is
assigned.

In addition, if mount becomes stuck waiting for a reply from
gfs_controld which never arrives, then it is killable and will abort the
mount gracefully.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ba6e9364

GFS2: Use nobh_writepage · 30116ff6

由 Steven Whitehouse 提交于 6月 14, 2010

Use nobh_writepage rather than calling mpage_writepage directly.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>

30116ff6

GFS2: Use kmalloc when possible for ->readdir() · d2a97a4e

由 Steven Whitehouse 提交于 7月 28, 2010

If we don't need a huge amount of memory in ->readdir() then
we can use kmalloc rather than vmalloc to allocate it. This
should cut down on the greater overheads associated with
vmalloc for smaller directories.

We may be able to eliminate vmalloc entirely at some stage,
but this is easy to do right away.

Also using GFP_NOFS to avoid any issues wrt to deleting inodes
while under a glock, and suggestion from Linus to factor out
the alloc/dealloc.

I've given this a test with a variety of different sized
directories and it seems to work ok.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2a97a4e

23 7月, 2010 1 次提交

gfs2: use workqueue instead of slow-work · 6ecd7c2d

由 Tejun Heo 提交于 7月 20, 2010

Workqueue can now handle high concurrency.  Convert gfs to use
workqueue instead of slow-work.

* Steven pointed out that recovery path might be run from allocation
  path and thus requires forward progress guarantee without memory
  allocation.  Create and use gfs_recovery_wq with rescuer.  Please
  note that forward progress wasn't guaranteed with slow-work.

* Updated to use non-reentrant workqueue.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>

6ecd7c2d

21 7月, 2010 1 次提交

quota: Clean up the namespace in dqblk_xfs.h · ade7ce31

由 Christoph Hellwig 提交于 6月 04, 2010

Almost all identifiers use the FS_* namespace, so rename the missing few
XFS_* ones to FS_* as well.  Without this some people might get upset
about having too many XFS names in generic code.
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

ade7ce31

19 7月, 2010 1 次提交

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

15 7月, 2010 5 次提交

GFS2: rename causes kernel Oops · 728a756b

由 Bob Peterson 提交于 7月 14, 2010

This patch fixes a kernel Oops in the GFS2 rename code.

The problem was in the way the gfs2 directory code was trying
to re-use sentinel directory entries.

In the failing case, gfs2's rename function was renaming a
file to another name that had the same non-trivial length.
The file being renamed happened to be the first directory
entry on the leaf block.

First, the rename code (gfs2_rename in ops_inode.c) found the
original directory entry and decided it could do its job by
simply replacing the directory entry with another.  Therefore
it determined correctly that no block allocations were needed.

Next, the rename code deleted the old directory entry prior to
replacing it with the new name.  Therefore, the soon-to-be
replaced directory entry was temporarily made into a directory
entry "sentinel" or a place holder at the start of a leaf block.

Lastly, it went to re-add the replacement directory entry in
that leaf block.  However, when gfs2_dirent_find_space was
looking for space in the leaf block, it used the wrong value
for the sentinel.  That threw off its calculations so later
it decides it can't really re-use the sentinel and therefore
must allocate a new leaf block.  But because it previously decided
to re-use the directory entry, it didn't waste the time to
grab a new block allocation for the inode.  Therefore, the
inode's i_alloc pointer was still NULL and it crashes trying to
reference it.

In the case of sentinel directory entries, the entire dirent is
reused, not just the "free space" portion of it, and therefore
the function gfs2_dirent_find_space should use the value 0
rather than GFS2_DIRENT_SIZE(0) for the actual dirent size.

Fixing this calculation enables the reproducer programs to work
properly.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

728a756b

GFS2: BUG in gfs2_adjust_quota · 8b421601

由 Abhijith Das 提交于 7月 04, 2010

HighMem pages on i686 do not get mapped to the buffer_heads and this was
causing a NULL pointer dereference when we were trying to memset page buffers
to zero.
We now use zero_user() that kmaps the page and directly manipulates page data.
This patch also fixes a boundary condition that was incorrect.
Signed-off-by: NAbhi Das <adas@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8b421601

GFS2: Fix kernel NULL pointer dereference by dlm_astd · b1becbde

由 Bob Peterson 提交于 6月 17, 2010

This patch fixes a problem in an error path when looking
up dinodes.  There are two sister-functions, gfs2_inode_lookup
and gfs2_process_unlinked_inode.  Both functions acquire and
hold the i_iopen glock for the dinode being looked up. The last
thing they try to do is hold the i_gl glock for the dinode.
If that glock fails for some reason, the error path was
incorrectly calling gfs2_glock_put for the i_iopen glock twice.
This resulted in the glock being prematurely freed.  The
"minimum hold time" usually kept the glock in memory, but the
lock interface to dlm (aka lock_dlm) freed its memory for the
glock.  In some circumstances, it would cause dlm's dlm_astd daemon
to try to call the bast function for the freed lock_dlm memory,
which resulted in a NULL pointer dereference.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b1becbde

GFS2: recovery stuck on transaction lock · b7dc2df5

由 Bob Peterson 提交于 6月 23, 2010

This patch fixes bugzilla bug #590878: GFS2: recovery stuck on
transaction lock. We set the frozen flag on the glock when we receive
a completion that cannot be delivered due to blocked locks. At that
point we check to see whether the first waiting holder has the noexp
flag set. If the noexp lock is queued later, then we need to unfreeze
the glock at that point in time, namely, in the glock work function.

This patch was originally written by Steve Whitehouse, but since
he's on holiday, I'm submitting it. It's been well tested with a
complex recovery test called revolver.
Signed-off-by: NSteve Whitehouse <swhiteho@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b7dc2df5

GFS2: O_TRUNC not working on stuffed files across cluster · a8bf2bc2

由 Bob Peterson 提交于 6月 24, 2010

This patch replaces a statement that got dropped out by accident.
Without the patch, truncates on stuffed (very small) files cause
those files to have an unpredictable size.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a8bf2bc2

28 5月, 2010 2 次提交

kill spurious reference to vmtruncate · 15c6fd97

由 npiggin@suse.de 提交于 5月 27, 2010

Lots of filesystems calls vmtruncate despite not implementing the old
->truncate method.  Switch them to use simple_setsize and add some
comments about the truncate code where it seems fitting.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

15c6fd97

drop unused dentry argument to ->fsync · 7ea80859

由 Christoph Hellwig 提交于 5月 26, 2010

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ea80859

24 5月, 2010 1 次提交

GFS2: Fix permissions checking for setflags ioctl() · 7df0e039

由 Steven Whitehouse 提交于 5月 24, 2010

We should be checking for the ownership of the file for which
flags are being set, rather than just for write access.
Reported-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7df0e039

22 5月, 2010 3 次提交

gfs: constify xattr_handler · b7bb0a12

由 Stephen Hemminger 提交于 5月 13, 2010

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7bb0a12

quota: unify ->set_dqblk · c472b432

由 Christoph Hellwig 提交于 5月 06, 2010

Pass the larger struct fs_disk_quota to the ->set_dqblk operation so
that the Q_SETQUOTA and Q_XSETQUOTA operations can be implemented
with a single filesystem operation and we can retire the ->set_xquota
operation.  The additional information (RT-subvolume accounting and
warn counts) are left zero for the VFS quota implementation.

Add new fieldmask values for setting the numer of blocks and inodes
values which is required for the VFS quota, but wasn't for XFS.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

c472b432

quota: unify ->get_dqblk · b9b2dd36

由 Christoph Hellwig 提交于 5月 06, 2010

Pass the larger struct fs_disk_quota to the ->get_dqblk operation so
that the Q_GETQUOTA and Q_XGETQUOTA operations can be implemented
with a single filesystem operation and we can retire the ->get_xquota
operation.  The additional information (RT-subvolume accounting and
warn counts) are left zero for the VFS quota implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

b9b2dd36

21 5月, 2010 1 次提交

GFS2: Don't "get" xattrs for ACLs when ACLs are turned off · f72f2d2e

由 Steven Whitehouse 提交于 5月 21, 2010

This is to match ext3 behaviour. We should not allow getting of
xattrs relating to ACLs when ACLs are turned off.
Reported-by: NNate Straz <nstraz@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

f72f2d2e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功