提交 · 4f6ae1a49ed5c81501d6f7385416bb4e07289e99 · openanolis / cloud-kernel

08 7月, 2011 16 次提交

xfs: avoid usage of struct xfs_dir2_block · 4f6ae1a4

由 Christoph Hellwig 提交于 7月 08, 2011

In most places we can simply pass around and use the struct xfs_dir2_data_hdr,
which is the first and most important member of struct xfs_dir2_block instead
of the full structure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

4f6ae1a4

xfs: cleanup the definition of struct xfs_dir2_sf_entry · 78f70cd7

由 Christoph Hellwig 提交于 7月 08, 2011

Remove the inumber member which is at a variable offset after the actual
name, and make name a real variable sized C99 array instead of the incorrect
one-sized array which confuses (not only) gcc.  Based on this clean up
the helpers to calculate the entry size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

78f70cd7

xfs: kill struct xfs_dir2_sf · ac8ba50f

由 Christoph Hellwig 提交于 7月 08, 2011

The list field of it is never cactually used, so all uses can simply be
replaced with the xfs_dir2_sf_hdr_t type that it has as first member.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

ac8ba50f

xfs: cleanup shortform directory inode number handling · 8bc38787

由 Christoph Hellwig 提交于 7月 08, 2011

Refactor the shortform directory helpers that deal with the 32-bit vs
64-bit wide inode numbers into more sensible helpers, and kill the
xfs_intino_t typedef that is now superflous.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

8bc38787

xfs: factor out xfs_dir2_leaf_find_entry · 4fb44c82

由 Christoph Hellwig 提交于 7月 08, 2011

Add a new xfs_dir2_leaf_find_entry helper to factor out some duplicate code
from xfs_dir2_leaf_addname xfs_dir2_leafn_add.  Found by Eric Sandeen using
an automated code duplication checker.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

4fb44c82

xfs: kill the unused struct xfs_sync_work · 29d104af

由 Christoph Hellwig 提交于 7月 08, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

29d104af

xfs: remove i_transp · f3ca8738

由 Christoph Hellwig 提交于 7月 08, 2011

Remove the transaction pointer in the inode.  It's only used to avoid
passing down an argument in the bmap code, and for a few asserts in
the transaction code right now.

Also use the local variable ip in a few more places in xfs_inode_item_unlock,
so that it isn't only used for debug builds after the above change.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

f3ca8738

xfs: fix filesystsem freeze race in xfs_trans_alloc · 7a249cf8

由 Christoph Hellwig 提交于 7月 08, 2011

As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem
freeze when it sleeps during the memory allocation. Fix this by moving the
wait_for_freeze call after the memory allocation. This means moving the
freeze into the low-level _xfs_trans_alloc helper, which thus grows a new
argument. Also fix up some comments in that area while at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <david@fromorbit.com>

7a249cf8

xfs: improve sync behaviour in the face of aggressive dirtying · 33b8f7c2

由 Christoph Hellwig 提交于 7月 08, 2011

The following script from Wu Fengguang shows very bad behaviour in XFS
when aggressively dirtying data during a sync on XFS, with sync times
up to almost 10 times as long as ext4.

A large part of the issue is that XFS writes data out itself two times
in the ->sync_fs method, overriding the livelock protection in the core
writeback code, and another issue is the lock-less xfs_ioend_wait call,
which doesn't prevent new ioend from being queue up while waiting for
the count to reach zero.

This patch removes the XFS-internal sync calls and relies on the VFS
to do it's work just like all other filesystems do.  Note that the
i_iocount wait which is rather suboptimal is simply removed here.
We already do it in ->write_inode, which keeps the current supoptimal
behaviour.  We'll eventually need to remove that as well, but that's
material for a separate commit.

------------------------------ snip ------------------------------
#!/bin/sh

umount /dev/sda7
mkfs.xfs -f /dev/sda7
# mkfs.ext4 /dev/sda7
# mkfs.btrfs /dev/sda7
mount /dev/sda7 /fs

echo $((50<<20)) > /proc/sys/vm/dirty_bytes

pid=
for i in `seq 10`
do
	dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 &
	pid="$pid $!"
done

sleep 1

tic=$(date +'%s')
sync
tac=$(date +'%s')

echo
echo sync time: $((tac-tic))
egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo

pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; }
------------------------------ snip ------------------------------
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

33b8f7c2

xfs: split xfs_itruncate_finish · 8f04c47a

由 Christoph Hellwig 提交于 7月 08, 2011

Split the guts of xfs_itruncate_finish that loop over the existing extents
and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs.
Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish,
which allows to simplify the latter a lot, by only letting it deal with
the data fork.  As a result xfs_itruncate_finish is renamed to
xfs_itruncate_data to make its use case more obvious.

Also remove the sync parameter from xfs_itruncate_data, which has been
unessecary since the introduction of the busy extent list in 2002, and
completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was
made a no-op.

I can't actually see why the xfs_attr_inactive needs to set the transaction
sync, but let's keep this patch simple and without changes in behaviour.

Also avoid passing a useless argument to xfs_isize_check, and make it
private to xfs_inode.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

8f04c47a

xfs: kill xfs_itruncate_start · 857b9778

由 Christoph Hellwig 提交于 7月 08, 2011

xfs_itruncate_start is a rather length wrapper that evaluates to a call
to xfs_ioend_wait and xfs_tosspages, and only has two callers.

Instead of using the complicated checks left over from IRIX where we
can to truncate the pagecache just call xfs_tosspages
(aka truncate_inode_pages) directly as we want to get rid of all data
after i_size, and truncate_inode_pages handles incorrect alignments
and too large offsets just fine.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

857b9778

xfs: always log timestamp updates in xfs_setattr_size · 681b1200

由 Christoph Hellwig 提交于 7月 08, 2011

Get rid of the special case where we use unlogged timestamp updates for
a truncate to the current inode size, and just call xfs_setattr_nonsize
for it to treat it like a utimes calls.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

681b1200

xfs: split xfs_setattr · c4ed4243

由 Christoph Hellwig 提交于 7月 08, 2011

Split up xfs_setattr into two functions, one for the complex truncate
handling, and one for the trivial attribute updates.  Also move both
new routines to xfs_iops.c as they are fairly Linux-specific.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

c4ed4243

xfs: work around bogus gcc warning in xfs_allocbt_init_cursor · dec58f1d

由 Christoph Hellwig 提交于 7月 08, 2011

GCC 4.6 complains about an array subscript is above array bounds when
using the btree index to index into the agf_levels array.  The only
two indices passed in are 0 and 1, and we have an assert insuring that.

Replace the trick of using the array index directly with using constants
in the already existing branch for assigning the XFS_BTREE_LASTREC_UPDATE
flag.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

dec58f1d

xfs: re-enable non-blocking behaviour in xfs_map_blocks · dbcdde3e

由 Christoph Hellwig 提交于 7月 08, 2011

The non-blockig behaviour in xfs_vm_writepage currently is conditional on
having both the WB_SYNC_NONE sync_mode and the nonblocking flag set.
The latter used to be used by both pdflush, kswapd and a few other places
in older kernels, but has been fading out starting with the introduction
of the per-bdi flusher threads.

Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back
the behaviour we want.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

dbcdde3e

xfs: PF_FSTRANS should never be set in ->writepage · 680a647b

由 Christoph Hellwig 提交于 7月 08, 2011

Now that we reject direct reclaim in addition to always using GFP_NOFS
allocation there's no chance we'll ever end up in ->writepage with
PF_FSTRANS set.  Add a WARN_ON if we hit this case, and stop checking
if we'd actually need to start a transaction.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

680a647b

07 7月, 2011 1 次提交

xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da

由 Dave Chinner 提交于 7月 04, 2011

When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
in the AIL").

To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).

The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path.  Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.

To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.

Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.

So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1316d4da

24 6月, 2011 3 次提交

xfs: prevent bogus assert when trying to remove non-existent attribute · 4a338212

由 Dave Chinner 提交于 6月 23, 2011

If the attribute fork on an inode is in btree format and has
multiple levels (i.e node format rather than leaf format), then a
lookup failure will trigger an assert failure in xfs_da_path_shift
if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to
indicate to the directory btree code that not finding an entry is
not a fatal error. In the case of doing a lookup for a directory
name removal, this is valid as a user cannot insert an arbitrary
name to remove from the directory btree.

However, in the case of the attribute tree, a user has direct
control over the attribute name and can ask for any random name to
be removed without any validation. In this case, fsstress is asking
for a non-existent user.selinux attribute to be removed, and that is
causing xfs_da_path_shift() to fall off the bottom of the tree where
it asserts that a lookup failure is allowed. Because the flag is not
set, we die a horrible death on a debug enable kernel.

Prevent this assert from firing on attribute removes by adding the
op_flag XFS_DA_OP_OKNOENT to atribute removal operations.

Discovered when testing on a SELinux enabled system by fsstress in
test 070 by trying to remove a non-existent user.selinux attribute.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4a338212

xfs: clear XFS_IDIRTY_RELEASE on truncate down · df4368a1

由 Dave Chinner 提交于 6月 23, 2011

When an inode is truncated down, speculative preallocation is
removed from the inode. This should also reset the state bits for
controlling whether preallocation is subsequently removed when the
file is next closed. The flag is not being cleared, so repeated
operations on a file that first involve a truncate (e.g. multiple
repeated dd invocations on a file) give different file layouts for
the second and subsequent invocations.

Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the
XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure
that speculative delalloc is removed on files that have been
truncated down.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

df4368a1

xfs: reset inode per-lifetime state when recycling it · 778e24bb

由 Dave Chinner 提交于 6月 23, 2011

XFS inodes has several per-lifetime state fields that determine the
behaviour of the inode. These state fields are not all reset when an
inode is reused from the reclaimable state.

This can lead to unexpected behaviour of the new inode such as
speculative preallocation not being truncated away in the expected
manner for local files until the inode is subsequently truncated,
freed or cycles out of the cache. It can also lead to an inode being
considered to be a filestream inode or having been truncated when
that is not the case.

Rework the reinitialisation of the inode when it is recycled to
ensure that it is pristine before it is reused. While there, also
fix the resetting of state flags in the recycling error paths so the
inode does not become unreclaimable.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

778e24bb

16 6月, 2011 1 次提交

xfs: make log devices with write back caches work · a27a263b

由 Christoph Hellwig 提交于 6月 16, 2011

There's no reason not to support cache flushing on external log devices.
The only thing this really requires is flushing the data device first
both in fsync and log commits.  A side effect is that we also have to
remove the barrier write test during mount, which has been superflous
since the new FLUSH+FUA code anyway.  Also use the chance to flush the
RT subvolume write cache before the fsync commit, which is required
for correct semantics.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a27a263b

15 6月, 2011 1 次提交

xfs: fix ->mknod() return value on xfs_get_acl() failure · c46a131c

由 Al Viro 提交于 6月 05, 2011

->mknod() should return negative on errors and PTR_ERR() gives
already negative value...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c46a131c

27 5月, 2011 1 次提交

fs: pass exact type of data dirties to ->dirty_inode · aa385729

由 Christoph Hellwig 提交于 5月 27, 2011

Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that ->dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa385729

25 5月, 2011 12 次提交

xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent · 233eebb9

由 Christoph Hellwig 提交于 5月 11, 2011

The code in xfs_bmap_del_extent does not correctly decrement the
extent buffer index when deleting a whole extent.  Most of the time
this gets caught by checks in xfs_bmapi that work around it and
decrement it manually and thus wasn't noticed so far.

Based on an earlier patch from Lachlan McIlroy.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

233eebb9

xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec · 87bef181

由 Christoph Hellwig 提交于 5月 11, 2011

Based on an earlier patch from Lachlan McIlroy.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

87bef181

xfs: fix up asserts in xfs_iflush_fork · ab1908a5

由 Christoph Hellwig 提交于 5月 11, 2011

Remove asserts in xfs_iflush_fork that would call xfs_iext_get_ext
with a potentially invalid extent buffer index.

Based on an earlier patch from Lachlan McIlroy.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ab1908a5

xfs: do not do pointer arithmetic on extent records · f1c63b73

由 Christoph Hellwig 提交于 5月 11, 2011

We need to call xfs_iext_get_ext for the previous extent to get a
valid pointer, and can't just do pointer arithmetics as they might
be in different pages.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

f1c63b73

xfs: do not use unchecked extent indices in xfs_bunmapi · 00239acf

由 Christoph Hellwig 提交于 5月 11, 2011

Make sure to only call xfs_iext_get_ext after we've validate the
extent index when moving on to the next index in xfs_bunmapi.  Also
remove the old workaround for too large indices that has been
superceeded by the proper fix in xfs_bmap_del_extent.

Based on an earlier patch from Lachlan McIlroy.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

00239acf

xfs: do not use unchecked extent indices in xfs_bmapi · 5690f921

由 Christoph Hellwig 提交于 5月 11, 2011

Make sure to only call xfs_iext_get_ext after we've validate the
extent index when moving on to the next index in xfs_bmapi.

Based on an earlier patch from Lachlan McIlroy.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5690f921

xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* · 2f2b3220

由 Christoph Hellwig 提交于 5月 11, 2011

Make sure to only call xfs_iext_get_ext after we've validate the
extent index in the various xfs_bmap_add_extent_* helpers.

Based on an earlier patch from Lachlan McIlroy.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

2f2b3220

xfs: remove if_lastex · ec90c556

由 Christoph Hellwig 提交于 5月 23, 2011

The if_lastex field in struct xfs_ifork is only used as a temporary
index during xfs_bmapi and xfs_bunmapi.  Instead of using the inode
fork to store it keep it local in the callchain.  Fortunately this
is very easy as we already pass a stack copy of it down the whole
chain which can simplify be changed to be passed by reference.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ec90c556

xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag · 54893273

由 Christoph Hellwig 提交于 5月 11, 2011

The XFS_BMAPI_RSVBLOCKS is unused, and as far as I can see has
always been.  Remove it to simplify the bmapi implementation and
conserve stack space.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

54893273

vmscan: change shrinker API by passing shrink_control struct · 1495f230

由 Ying Han 提交于 5月 24, 2011

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: NYing Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1495f230

xfs: do not discard alloc btree blocks · 55a7bc5a

由 Christoph Hellwig 提交于 5月 04, 2011

Blocks for the allocation btree are allocated from and released to
the AGFL, and thus frequently reused. Even worse we do not have an
easy way to avoid using an AGFL block when it is discarded due to
the simple FILO list of free blocks, and thus can frequently stall
on blocks that are currently undergoing a discard.

Add a flag to the busy extent tracking structure to skip the discard
for allocation btree blocks. In normal operation these blocks are
reused frequently enough that there is no need to discard them
anyway, but if they spill over to the allocation btree as part of a
balance we "leak" blocks that we would otherwise discard. We could
fix this by adding another flag and keeping these block in the
rbtree even after they aren't busy any more so that we could discard
them when they migrate out of the AGFL. Given that this would cause
significant overhead I don't think it's worthwile for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

55a7bc5a

xfs: add online discard support · e84661aa

由 Christoph Hellwig 提交于 5月 20, 2011

Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.

The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard.  Note that we don't bother
supporting discard for the non-delaylog mode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e84661aa

20 5月, 2011 5 次提交

xfs: obey minleft values during extent allocation correctly · bf59170a

由 Dave Chinner 提交于 4月 21, 2011

When allocating an extent that is long enough to consume the
remaining free space in an AG, we need to ensure that the allocation
leaves enough space in the AG for any subsequent bmap btree blocks
that are needed to track the new extent. These have to be allocated
in the same AG as we only reserve enough blocks in an allocation
transaction for modification of the freespace trees in a single AG.

xfs_alloc_fix_minleft() has been considering blocks on the AGFL as
free blocks available for extent and bmbt block allocation, which is
not correct - blocks on the AGFL are there exclusively for the use
of the free space btrees. As a result, when minleft is less than the
number of blocks on the AGFL, xfs_alloc_fix_minleft() does not trim
the given extent to leave minleft blocks available for bmbt
allocation, and hence we can fail allocation during bmbt record
insertion.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

bf59170a

xfs: reset buffer pointers before freeing them · 44396476

由 Dave Chinner 提交于 4月 21, 2011

When we free a vmapped buffer, we need to ensure the vmap address
and length we free is the same as when it was allocated. In various
places in the log code we change the memory the buffer is pointing
to before issuing IO, but we never reset the buffer to point back to
it's original memory (or no memory, if that is the case for the
buffer).

As a result, when we free the buffer it points to memory that is
owned by something else and attempts to unmap and free it. Because
the range does not match any known mapped range, it can trigger
BUG_ON() traps in the vmap code, and potentially corrupt the vmap
area tracking.

Fix this by always resetting these buffers to their original state
before freeing them.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

44396476

xfs: avoid getting stuck during async inode flushes · ee58abdf

由 Dave Chinner 提交于 4月 21, 2011

When the underlying inode buffer is locked and xfs_sync_inode_attr()
is doing a non-blocking flush, xfs_iflush() can return EAGAIN.  When
this happens, clear the error rather than returning it to
xfs_inode_ag_walk(), as returning EAGAIN will result in the AG walk
delaying for a short while and trying again. This can result in
background walks getting stuck on the one AG until inode buffer is
unlocked by some other means.

This behaviour was noticed when analysing event traces followed by
code inspection and verification of the fix via further traces.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ee58abdf

xfs: fix xfs_itruncate_start tracing · e5737515

由 Dave Chinner 提交于 4月 21, 2011

Variables are ordered incorrectly in trace call.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e5737515

xfs: fix duplicate workqueue initialisation · 1beb65ad

由 Dave Chinner 提交于 5月 10, 2011

The workqueue initialisation function is called twice when
initialising the XFS subsystem. Remove the second initialisation
call.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1beb65ad

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功