提交 · 7fe3258c50de383037102129c57df5cb66ab2000 · openanolis / cloud-kernel

17 4月, 2013 2 次提交

xfs: Update xfs_log_commit_cil() comments · 7fe3258c

由 Jeff Liu 提交于 4月 04, 2013

xfs_log_commit_iclog() function has been removed by commits 93b8a585:
	xfs: remove the deprecated nodelaylog option

Beginning from Linux 3.3, only delayed logging is supported so that
we call xfs_log_commit_cil() at xfs_trans_commit() only, remove the
useless comments so.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

7fe3258c

xfs: Remove the obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros · d4fd0e92

由 Jeff Liu 提交于 4月 04, 2013

There is no more users of this Macro, so it's time to kill it dead.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

d4fd0e92

06 4月, 2013 1 次提交

xfs: don't free EFIs before the EFDs are committed · 666d644c

由 Dave Chinner 提交于 4月 03, 2013

Filesystems are occasionally being shut down with this error:

xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.

It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:

http://oss.sgi.com/archives/xfs/2013-01/msg00082.html

The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:

EFI			EFI in AIL	EFD		Result
committed/unpinned	Yes		committed	OK
committed/pinned	No		committed	Shutdown
uncommitted		No		committed	Shutdown


Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck.  That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be
fine.

It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the
AIL.

What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.

This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.

Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.

However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

666d644c

04 4月, 2013 1 次提交

xfs: Add ratelimited printk for different alert levels · 3d6e0361

由 Rich Johnston 提交于 3月 27, 2013

Ratelimited printk will be useful in printing xfs messages which are otherwise
not required to be printed always due to their high rate (to prevent kernel ring
buffer from overflowing), while at the same time required to be printed.
Signed-off-by: NRaghavendra D Prabhu <rprabhu@wnohang.net>
Reviewed-by: NRich Johnston <rjohnston@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

3d6e0361

23 3月, 2013 7 次提交

xfs: Fix WARN_ON(delalloc) in xfs_vm_releasepage() · ff9a28f6

由 Jan Kara 提交于 3月 14, 2013

When a dirty page is truncated from a file but reclaim gets to it before
truncate_inode_pages(), we hit WARN_ON(delalloc) in
xfs_vm_releasepage(). This is because reclaim tries to write the page,
xfs_vm_writepage() just bails out (leaving page clean) and thus reclaim
thinks it can continue and calls xfs_vm_releasepage() on page with dirty
buffers.

Fix the issue by redirtying the page in xfs_vm_writepage(). This makes
reclaim stop reclaiming the page and also logically it keeps page in a
more consistent state where page with dirty buffers has PageDirty set.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

ff9a28f6

xfs: xfs_iomap_prealloc_size() tracepoint · 19cb7e38

由 Brian Foster 提交于 3月 18, 2013

Add a tracepoint to provide some feedback on preallocation size
calculation.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

19cb7e38

xfs: add quota-driven speculative preallocation throttling · 76a4202a

由 Brian Foster 提交于 3月 18, 2013

Introduce the need_throttle() and calc_throttle() functions to
independently check whether throttling is required for a particular
dquot and if so, calculate the associated throttling metrics based
on the state of the quota. We use the same general algorithm to
calculate the throttle shift as for global free space with the
exception of using three stages rather than five.

Update xfs_iomap_prealloc_size() to use the smallest available
prealloc size based on each of the constraints and apply the
maximum shift to obtain the throttled preallocation size.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

76a4202a

xfs: xfs_dquot prealloc throttling watermarks and low free space · b1366451

由 Brian Foster 提交于 3月 18, 2013

Enable tracking of high and low watermarks for preallocation
throttling of files under quota restrictions. These values are
calculated when the quota limit is read from disk or modified and
cached for later use by the throttling algorithm.

The high watermark specifies when preallocation is disabled, the
low watermark specifies when throttling is enabled and the low free
space data structure contains precalculated low free space limits
to serve as input to determine the level of throttling required.

Note that the low free space data structure is based on the
existing global low free space data structure with the exception of
using three stages (5%, 3% and 1%) rather than five to reduce the
impact of xfs_dquot memory overhead.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

b1366451

xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t · 4b6eae2e

由 Brian Foster 提交于 3月 18, 2013

Modify xfs_qm_adjust_dqlimits() to take the xfs_dquot as a
parameter instead of just the xfs_disk_dquot_t so we can update
in-memory fields if necessary.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

4b6eae2e

xfs: push rounddown_pow_of_two() to after prealloc throttle · c9bdbdc0

由 Brian Foster 提交于 3月 18, 2013

The round down occurs towards the beginning of the function. Push
it down after throttling has occurred. This is to support adding
further transformations to 'alloc_blocks' that might not preserve
power-of-two alignment (and thus could lead to rounding down
multiple times).
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

c9bdbdc0

xfs: reorganize xfs_iomap_prealloc_size to remove indentation · 3c58b5f8

由 Brian Foster 提交于 3月 18, 2013

The majority of xfs_iomap_prealloc_size() executes within the
check for lack of default I/O size. Reverse the logic to remove the
extra indentation.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

3c58b5f8

15 3月, 2013 3 次提交

xfs: take inode version into account in XFS_LITINO · 56cea2d0

由 Christoph Hellwig 提交于 3月 12, 2013

Add a version argument to XFS_LITINO so that it can return different values
depending on the inode version.  This is required for the upcoming v3 inodes
with a larger fixed layout dinode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

56cea2d0

xfs: ensure we capture IO errors correctly · c163f9a1

由 Dave Chinner 提交于 3月 12, 2013

Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.

Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

c163f9a1

xfs: Remove obsoleted m_inode_shrink from xfs_mount structure · d8ddfe81

由 Jeff Liu 提交于 3月 11, 2013

Looks the old m_inode_shrink is obsoleted as we perform inodes reclaim per AG via
m_reclaim_workqueue, this patch remove it from the xfs_mount structure if so.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

d8ddfe81

08 3月, 2013 6 次提交

xfs: rearrange some code in xfs_bmap for better locality · 9e5987a7

由 Dave Chinner 提交于 2月 25, 2013

xfs_bmap.c is a big file, and some of the related code is spread all
throughout the file requiring function prototypes for static
function and jumping all through the file to follow a single call
path. Rearrange the code so that:

	a) related functionality is grouped together; and
	b) functions are grouped in call dependency order

While the diffstat is large, there are no code changes in the patch;
it is just moving the functionality around and removing the function
prototypes at the top of the file. The resulting layout of the code
is as follows (top of file to bottom):

	- miscellaneous helper functions
	- extent tree block counting routines
	- debug/sanity checking code
	- bmap free list manipulation functions
	- inode fork format manipulation functions
	- internal/external extent tree seach functions
	- extent tree manipulation functions used during allocation
	- functions used during extent read/allocate/removal
	  operations (i.e. xfs_bmapi_write, xfs_bmapi_read,
	  xfs_bunmapi and xfs_getbmap)

This means that following logic paths through the bmapi code is much
simpler - most of the code relevant to a specific operation is now
clustered together rather than spread all over the file....
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

9e5987a7

xfs: rename random32() to prandom_u32() · ecb3403d

由 Akinobu Mita 提交于 3月 04, 2013

Use more preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: <bpm@sgi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: xfs@oss.sgi.com
Signed-off-by: NBen Myers <bpm@sgi.com>

ecb3403d

xfs: don't verify buffers after IO errors · d5929de8

由 Dave Chinner 提交于 2月 27, 2013

When we read a buffer, we might get an error from the underlying
block device and not the real data. Hence if we get an IO error, we
shouldn't run the verifier but instead just pass the IO error
straight through.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

d5929de8

xfs: fix xfs_iomap_eof_prealloc_initial_size type · e8108ced

由 Mark Tinguely 提交于 2月 24, 2013

Fix the return type of xfs_iomap_eof_prealloc_initial_size() to
xfs_fsblock_t to reflect the fact that the return value may be an
unsigned 64 bits if XFS_BIG_BLKNOS is defined.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

e8108ced

xfs: increase prealloc size to double that of the previous extent · e114b5fc

由 Brian Foster 提交于 2月 19, 2013

The updated speculative preallocation algorithm for handling sparse
files can becomes less effective in situations with a high number of
concurrent, sequential writers. The number of writers and amount of
available RAM affect the writeback bandwidth slicing algorithm,
which in turn affects the block allocation pattern of XFS. For
example, running 32 sequential writers on a system with 32GB RAM,
preallocs become fixed at a value of around 128MB (instead of
steadily increasing to the 8GB maximum as sequential writes
proceed).

Update the speculative prealloc heuristic to base the size of the
next prealloc on double the size of the preceding extent. This
preserves the original aggressive speculative preallocation
behavior and continues to accomodate sparse files at a slight cost
of increasing the size of preallocated data regions following holes
of sparse files.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

e114b5fc

xfs: fix potential infinite loop in xfs_iomap_prealloc_size() · e78c420b

由 Brian Foster 提交于 2月 22, 2013

If freesp == 0, we could end up in an infinite loop while squashing
the preallocation. Break the loop when we've killed the prealloc
entirely.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

e78c420b

28 2月, 2013 1 次提交

hlist: drop the node parameter from iterators · b67bfe0d

由 Sasha Levin 提交于 2月 27, 2013

I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b67bfe0d

26 2月, 2013 1 次提交

fs: encode_fh: return FILEID_INVALID if invalid fid_type · 94e07a75

由 Namjae Jeon 提交于 2月 17, 2013

This patch is a follow up on below patch:

[PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
commit: 216b6cbdSigned-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NVivek Trivedi <t.vivek@samsung.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Acked-by: NSage Weil <sage@inktank.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94e07a75

23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
15 2月, 2013 4 次提交

xfs: xfs_bmap_add_attrfork_local is too generic · 1e82379b

由 Dave Chinner 提交于 2月 11, 2013

When we are converting local data to an extent format as a result of
adding an attribute, the type of data contained in the local fork
determines the behaviour that needs to occur.

xfs_bmap_add_attrfork_local() already handles the directory data
case specially by using S_ISDIR() and calling out to
xfs_dir2_sf_to_block(), but with verifiers we now need to handle
each different type of metadata specially and different metadata
formats require different verifiers (and eventually block header
initialisation).

There is only a single place that we add and attribute fork to
the inode, but that is in the attribute code and it knows nothing
about the specific contents of the data fork. It is only the case of
local data that is the issue here, so adding code to hadnle this
case in the attribute specific code is wrong. Hence we are really
stuck trying to detect the data fork contents in
xfs_bmap_add_attrfork_local() and performing the correct callout
there.

Luckily the current cases can be determined by S_IS* macros, and we
can push the work off to data specific callouts, but each of those
callouts does a lot of work in common with
xfs_bmap_local_to_extents(). The only reason that this fails for
symlinks right now is is that xfs_bmap_local_to_extents() assumes
the data fork contains extent data, and so attaches a a bmap extent
data verifier to the buffer and simply copies the data fork
information straight into it.

To fix this, allow us to pass a "formatting" callback into
xfs_bmap_local_to_extents() which is responsible for setting the
buffer type, initialising it and copying the data fork contents over
to the new buffer. This allows callers to specify how they want to
format the new buffer (which is necessary for the upcoming CRC
enabled metadata blocks) and hence make xfs_bmap_local_to_extents()
useful for any type of data fork content.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
Signed-off-by: NBen Myers <bpm@sgi.com>

1e82379b

xfs: remove log force from xfs_buf_trylock() · fa5566e4

由 Brian Foster 提交于 2月 11, 2013

The trylock log force invoked via xfs_buf_item_push() can attempt
to acquire xa_lock, thus leading to a recursion bug when called
with xa_lock held.

This log force was originally added to xfs_buf_trylock() to address
xfsaild stalls due to pinned and stale buffers. Since the addition
of this behavior, the log item pushing code had been reworked to
detect and track pinned items to inform xfsaild to issue a log
force itself when necessary. As such, the log force on trylock
failure is redundant and safe to remove.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

fa5566e4

xfs: recheck buffer pinned status after push trylock failure · 5337fe9b

由 Brian Foster 提交于 2月 11, 2013

The buffer pinned check and trylock sequence in xfs_buf_item_push()
can race with an active transaction on marking the buffer pinned.
This can result in the buffer becoming pinned and stale after the
initial check and the trylock failure, but before the check in
xfs_buf_trylock() that issues a log force. If the log force is
issued from this context, a spinlock recursion occurs on xa_lock.

Prepare xfs_buf_item_push() to handle the race by detecting a
pinned buffer after the trylock failure so xfsaild issues a log
force from a safe context. This, along with various previous fixes,
renders the log force in xfs_buf_trylock() redundant.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

5337fe9b

xfs: limit speculative prealloc size on sparse files · a1e16c26

由 Dave Chinner 提交于 2月 11, 2013

Speculative preallocation based on the current file size works well
for contiguous files, but is sub-optimal for sparse files where the
EOF preallocation can fill holes and result in large amounts of
zeros being written when it is not necessary.

The algorithm is modified to prevent EOF speculative preallocation
from triggering larger allocations on IO patterns of
truncate--to-zero-seek-write-seek-write-....  which results in
non-sparse files for large files. This, unfortunately, is the way cp
now behaves when copying sparse files and so needs to be fixed.

What this code does is that it looks at the existing extent adjacent
to the current EOF and if it determines that it is a hole we disable
speculative preallocation altogether. To avoid the next write from
doing a large prealloc, it takes the size of subsequent
preallocations from the current size of the existing EOF extent.
IOWs, if you leave a hole in the file, it resets preallocation
behaviour to the same as if it was a zero size file.

Example new behaviour:

$ xfs_io -f -c "pwrite 0 31m" \
            -c "pwrite 33m 1m" \
            -c "pwrite 128m 1m" \
            -c "fiemap -v" /mnt/scratch/blah
wrote 32505856/32505856 bytes at offset 0
31 MiB, 7936 ops; 0.0000 sec (1.608 GiB/sec and 421432.7439 ops/sec)
wrote 1048576/1048576 bytes at offset 34603008
1 MiB, 256 ops; 0.0000 sec (1.462 GiB/sec and 383233.5329 ops/sec)
wrote 1048576/1048576 bytes at offset 134217728
1 MiB, 256 ops; 0.0000 sec (1.719 GiB/sec and 450704.2254 ops/sec)
/mnt/scratch/blah:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..65535]:      96..65631        65536   0x0
   1: [65536..67583]:  hole              2048
   2: [67584..69631]:  67680..69727      2048   0x0
   3: [69632..262143]: hole             192512
   4: [262144..264191]: 262240..264287    2048   0x1
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

a1e16c26

07 2月, 2013 1 次提交

xfs: memory barrier before wake_up_bit() · 311f08ac

由 Alex Elder 提交于 2月 04, 2013

In xfs_ifunlock() there is a call to wake_up_bit() after clearing
the flush lock on the xfs inode.  This is not guaranteed to be safe,
as noted in the comments above wake_up_bit() beginning with:

    In order for this to function properly, as it uses
    waitqueue_active() internally, some kind of memory
    barrier must be done prior to calling this.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

311f08ac

02 2月, 2013 12 次提交

xfs: refactor space log reservation for XFS_TRANS_ATTR_SET · a21cd503

由 Jeff Liu 提交于 1月 28, 2013

Currently, we calculate the attribute set transaction
log space reservation at runtime in two parts:

1) XFS_ATTRSET_LOG_RES() which is calcuated out at mount time.

2) ((ext * (mp)->m_sb.sb_sectsize) + \
    (ext * XFS_FSB_TO_B((mp), XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))) + \
    (128 * (ext + (ext * XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))))))
which is calculated out at runtime since it depend on the given extent length in blocks.

This patch renamed XFS_ATTRSET_LOG_RES(mp) to XFS_ATTRSETM_LOG_RES(mp) to indicate
that it is figured out at mount time.  Introduce XFS_ATTRSETRT_LOG_RES(mp) which would
be used to calculate out the unit of the log space reservation for one block.

In this way, the total runtime space for the given extent length can be figured out by:
XFS_ATTRSETM_LOG_RES(mp) + XFS_ATTRSETRT_LOG_RES(mp) * ext
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

a21cd503

xfs: make use of XFS_SB_LOG_RES() at xfs_fs_log_dummy() · 762c585b

由 Jeff Liu 提交于 1月 28, 2013

Make use of XFS_SB_LOG_RES() at xfs_fs_log_dummy().
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

762c585b

xfs: make use of XFS_SB_LOG_RES() at xfs_mount_log_sb() · 5166ab06

由 Jeff Liu 提交于 1月 28, 2013

Make use of XFS_SB_LOG_RES() at xfs_mount_log_sb().
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

5166ab06

xfs: make use of XFS_SB_LOG_RES() at xfs_log_sbcount() · e457274b

由 Jeff Liu 提交于 1月 28, 2013

Make use of XFS_SB_LOG_RES() at xfs_log_sbcount().
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

e457274b

xfs: introduce XFS_SB_LOG_RES() for transactions that modify sb on disk · a7bd794a

由 Jeff Liu 提交于 1月 28, 2013

Introduce a new transaction space reservation XFS_SB_LOG_RES() for
those transactions that need to modify the superblock on disk.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

a7bd794a

xfs: calculate XFS_TRANS_QM_QUOTAOFF_END space log reservation at mount time · 762d7ba6

由 Jeff Liu 提交于 1月 28, 2013

Convert the calculation for end of quotaoff log space reservation
from runtime to mount time.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

762d7ba6

xfs: calculate XFS_TRANS_QM_QUOTAOFF space log reservation at mount time · a1bd9557

由 Jeff Liu 提交于 1月 28, 2013

Convert the calculation of quota off transaction log space reservation
from runtime to mount time.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

a1bd9557

xfs: calculate XFS_TRANS_QM_DQALLOC space log reservation at mount time · 48001044

由 Jeff Liu 提交于 1月 28, 2013

The disk quota allocation log space reservation is calcuated at runtime,
this patch does it at mount time.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

48001044

xfs: calcuate XFS_TRANS_QM_SETQLIM space log reservation at mount time · f0f2df94

由 Jeff Liu 提交于 1月 28, 2013

For adjusting quota limits transactions, we calculate out the log space
reservation at runtime, this patch does it at mount time.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

f0f2df94

xfs: calculate xfs_qm_write_sb_changes() space log reservation at mount time · f910a8c6

由 Jeff Liu 提交于 1月 28, 2013

For the transaction that write the incore superblock changes of quota flags
to disk, it would reserve the same log space to clear/reset quota flags
transaction, hence we can use XFS_TRANS_SBCHANGE_LOG_RES() for it as well.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

f910a8c6

xfs: calculate XFS_TRANS_QM_SBCHANGE space log reservation at mount time · b0c10b98

由 Jeff Liu 提交于 1月 28, 2013

The transaction log space for clearing/reseting the quota flags
is calculated out at runtime, this patch can figure it out at
mount time.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

b0c10b98

xfs: make use of xfs_calc_buf_res() in xfs_trans.c · 5b292ae3

由 Jeff Liu 提交于 2月 01, 2013

Refining the existing reservations with xfs_calc_buf_res() in xfs_trans.c
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

5b292ae3

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功