提交 · a36b926180cda375ac2ec89e1748b47137cfc51c · openanolis / cloud-kernel

31 1月, 2017 6 次提交

xfs: pull up iolock from xfs_free_eofblocks() · a36b9261

由 Brian Foster 提交于 1月 27, 2017

xfs_free_eofblocks() requires the IOLOCK_EXCL lock, but is called from
different contexts where the lock may or may not be held. The
need_iolock parameter exists for this reason, to indicate whether
xfs_free_eofblocks() must acquire the iolock itself before it can
proceed.

This is ugly and confusing. Simplify the semantics of
xfs_free_eofblocks() to require the caller to acquire the iolock
appropriately and kill the need_iolock parameter. While here, the mp
param can be removed as well as the xfs_mount is accessible from the
xfs_inode structure. This patch does not change behavior.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a36b9261

xfs: remove unused struct declarations · 64f61ab6

由 Eric Sandeen 提交于 1月 27, 2017

After scratching my head looking for "xfs_busy_extent" I realized
it's not used; it's xfs_extent_busy, and the declaration for the
other name is bogus.  Remove that and a few others as well.

(struct xfs_log_callback is used, but the 2nd declaration is
unnecessary).
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

64f61ab6

iomap: constify struct iomap_ops · 8ff6daa1

由 Christoph Hellwig 提交于 1月 27, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

8ff6daa1

xfs: remove boilerplate around xfs_btree_init_block · b6f41e44

由 Eric Sandeen 提交于 1月 27, 2017

Now that xfs_btree_init_block_int is able to determine crc
status from the passed-in mp, we can determine the proper
magic as well if we are given a btree number, rather than
an explicit magic value.

Change xfs_btree_init_block[_int] callers to pass in the
btree number, and let xfs_btree_init_block_int use the
xfs_magics array via the xfs_btree_magic macro to determine
which magic value is needed.  This makes all of the
if (crc) / else stanzas identical, and the if/else can be
removed, leading to a single, common init_block call.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

b6f41e44

xfs: make xfs_btree_magic more generic · af7d20fd

由 Eric Sandeen 提交于 1月 27, 2017

Right now the xfs_btree_magic() define takes only a cursor;
change this to take crc and btnum args to make it more generically
useful, and move to a function.

This will allow xfs_btree_init_block_int callers which don't
have a cursor to make use of the xfs_magics array, which will
happen in the next patch.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

af7d20fd

xfs: glean crc status from mp not flags in xfs_btree_init_block_int · f88ae46b

由 Eric Sandeen 提交于 1月 27, 2017

xfs_btree_init_block_int() can determine whether crcs are
in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag;
the mp argument allows us to determine this from the
superblock.  Remove the flag from callers, and use
xfs_sb_version_hascrc(&mp->m_sb) internally instead.

This removes one difference between the if & else cases
in the callers.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f88ae46b

28 1月, 2017 1 次提交

xfs: prevent quotacheck from overloading inode lru · e0d76fa4

由 Brian Foster 提交于 1月 26, 2017

Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.

Commit 17c12bcd ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release.  The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.

Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.

CC: stable@vger.kernel.org # v4.9
Reported-by: NMartin Svec <martin.svec@zoner.cz>
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e0d76fa4

27 1月, 2017 1 次提交

xfs: fix bmv_count confusion w/ shared extents · c364b6d0

由 Darrick J. Wong 提交于 1月 26, 2017

In a bmapx call, bmv_count is the total size of the array, including the
zeroth element that userspace uses to supply the search key. The output
array starts at offset 1 so that we can set up the user for the next
invocation. Since we now can split an extent into multiple bmap records
due to shared/unshared status, we have to be careful that we don't
overflow the output array.

In the original patch f86f4037 ("xfs: teach get_bmapx about shared
extents and the CoW fork") I used cur_ext (the output index) to check
for overflows, albeit with an off-by-one error. Since nexleft no longer
describes the number of unfilled slots in the output, we can rip all
that out and use cur_ext for the overflow check directly.

Failure to do this causes heap corruption in bmapx callers such as
xfs_io and xfs_scrub. xfs/328 can reproduce this problem.
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c364b6d0

26 1月, 2017 2 次提交

xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b

由 Darrick J. Wong 提交于 1月 25, 2017

If we try to allocate memory pages to back an xfs_buf that we're trying
to read, it's possible that we'll be so short on memory that the page
allocation fails.  For a blocking read we'll just wait, but for
readahead we simply dump all the pages we've collected so far.

Unfortunately, after dumping the pages we neglect to clear the
_XBF_PAGES state, which means that the subsequent call to xfs_buf_free
thinks that b_pages still points to pages we own.  It then double-frees
the b_pages pages.

This results in screaming about negative page refcounts from the memory
manager, which xfs oughtn't be triggering.  To reproduce this case,
mount a filesystem where the size of the inodes far outweighs the
availalble memory (a ~500M inode filesystem on a VM with 300MB memory
did the trick here) and run bulkstat in parallel with other memory
eating processes to put a huge load on the system.  The "check summary"
phase of xfs_scrub also works for this purpose.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

2aa6ba7b

xfs: extsize hints are not unlikely in xfs_bmap_btalloc · 493611eb

由 Christoph Hellwig 提交于 1月 25, 2017

With COW files they are the hotpath, just like for files with the
extent size hint attribute.  We really shouldn't micro-manage anything
but failure cases with unlikely.

Additionally Arnd Bergmann recently reported that one of these two
unlikely annotations causes link failures together with an upcoming
kernel instrumentation patch, so let's get rid of it ASAP.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

493611eb

25 1月, 2017 4 次提交

xfs: remove racy hasattr check from attr ops · 5a93790d

由 Brian Foster 提交于 1月 25, 2017

xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
away a lock cycle in cases where the fork does not exist or is otherwise
empty. This check is not safe, however, because an attribute fork short
form to extent format conversion includes a transient state that causes
the xfs_inode_hasattr() check to fail. Specifically,
xfs_attr_shortform_to_leaf() creates an empty extent format attribute
fork and then adds the existing shortform attributes to it.

This means that lookup of an existing xattr can spuriously return
-ENOATTR when racing against a setxattr that causes the associated
format conversion. This was originally reproduced by an untar on a
particularly configured glusterfs volume, but can also be reproduced on
demand with properly crafted xattr requests.

The format conversion occurs under the exclusive ilock. xfs_attr_get()
and xfs_attr_remove() already have the proper locking and checks further
down in the functions to handle this situation correctly. Drop the
unlocked checks to avoid the spurious failure and rely on the existing
logic.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

5a93790d

xfs: use per-AG reservations for the finobt · 76d771b4

由 Christoph Hellwig 提交于 1月 25, 2017

Currently we try to rely on the global reserved block pool for block
allocations for the free inode btree, but I have customer reports
(fairly complex workload, need to find an easier reproducer) where that
is not enough as the AG where we free an inode that requires a new
finobt block is entirely full.  This causes us to cancel a dirty
transaction and thus a file system shutdown.

I think the right way to guard against this is to treat the finot the same
way as the refcount btree and have a per-AG reservations for the possible
worst case size of it, and the patch below implements that.

Note that this could increase mount times with large finobt trees.  In
an ideal world we would have added a field for the number of finobt
fields to the AGI, similar to what we did for the refcount blocks.
We should do add it next time we rev the AGI or AGF format by adding
new fields.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

76d771b4

xfs: only update mount/resv fields on success in __xfs_ag_resv_init · 4dfa2b84

由 Christoph Hellwig 提交于 1月 25, 2017

Try to reserve the blocks first and only then update the fields in
or hanging off the mount structure.  This way we can call __xfs_ag_resv_init
again after a previous failure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

4dfa2b84

xfs: verify dirblocklog correctly · 83d230eb

由 Darrick J. Wong 提交于 1月 23, 2017

sb_dirblklog is added to sb_blocklog to compute the directory block size
in bytes.  Therefore, we must compare the sum of both those values
against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

83d230eb

24 1月, 2017 1 次提交

xfs: fix COW writeback race · d2b3964a

由 Christoph Hellwig 提交于 1月 20, 2017

Due to the way how xfs_iomap_write_allocate tries to convert the whole
found extents from delalloc to real space we can run into a race
condition with multiple threads doing writes to this same extent.
For the non-COW case that is harmless as the only thing that can happen
is that we call xfs_bmapi_write on an extent that has already been
converted to a real allocation.  For COW writes where we move the extent
from the COW to the data fork after I/O completion the race is, however,
not quite as harmless.  In the worst case we are now calling
xfs_bmapi_write on a region that contains hole in the COW work, which
will trip up an assert in debug builds or lead to file system corruption
in non-debug builds.  This seems to be reproducible with workloads of
small O_DSYNC write, although so far I've not managed to come up with
a with an isolated reproducer.

The fix for the issue is relatively simple:  tell xfs_bmapi_write
that we are only asked to convert delayed allocations and skip holes
in that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d2b3964a

19 1月, 2017 1 次提交

xfs: fix xfs_mode_to_ftype() prototype · fd29f7af

由 Arnd Bergmann 提交于 1月 18, 2017

A harmless warning just got introduced:

fs/xfs/libxfs/xfs_dir2.h:40:8: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]

Removing the 'const' modifier avoids the warning and has no
other effect.

Fixes: 1fc4d33f ("xfs: replace xfs_mode_to_ftype table with switch statement")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

fd29f7af

18 1月, 2017 7 次提交

xfs: don't wrap ID in xfs_dq_get_next_id · 657bdfb7

由 Eric Sandeen 提交于 1月 17, 2017

The GETNEXTQOTA ioctl takes whatever ID is sent in,
and looks for the next active quota for an user
equal or higher to that ID.

But if we are at the maximum ID and then ask for the "next"
one, we may wrap back to zero.  In this case, userspace
may loop forever, because it will start querying again
at zero.

We'll fix this in userspace as well, but for the kernel,
return -ENOENT if we ask for the next quota ID
past UINT_MAX so the caller knows to stop.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

657bdfb7

xfs: sanity check inode di_mode · a324cbf1

由 Amir Goldstein 提交于 1月 17, 2017

Check for invalid file type in xfs_dinode_verify()
and fail to load the inode structure from disk.
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a324cbf1

xfs: sanity check inode mode when creating new dentry · fab8eef8

由 Amir Goldstein 提交于 1月 17, 2017

The helper xfs_dentry_to_name() is used by 2 different
classes of callers: Callers that pass zero mode and don't care
about the returned name.type field and Callers that pass
non zero mode and do care about the name.type field.

Change xfs_dentry_to_name() to not take the mode argument and
change the call sites of the first class to not pass the mode
argument.

Create a new helper xfs_dentry_mode_to_name() which does pass
the mode argument and returns -EFSCORRUPTED if mode is invalid.
Callers that translate non zero mode to on-disk file type now
check the return value and will export the error to user instead
of staging an invalid file type to be written to directory entry.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

fab8eef8

xfs: replace xfs_mode_to_ftype table with switch statement · 1fc4d33f

由 Amir Goldstein 提交于 1月 17, 2017

The size of the xfs_mode_to_ftype[] conversion table
was too small to handle an invalid value of mode=S_IFMT.

Instead of fixing the table size, replace the conversion table
with a conversion helper that uses a switch statement.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

1fc4d33f

xfs: add missing include dependencies to xfs_dir2.h · b597dd53

由 Amir Goldstein 提交于 1月 17, 2017

xfs_dir2.h dereferences some data types in inline functions
and fails to include those type definitions, e.g.:
xfs_dir2_data_aoff_t, struct xfs_da_geometry.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

b597dd53

xfs: sanity check directory inode di_size · 3c6f46ea

由 Amir Goldstein 提交于 1月 17, 2017

This changes fixes an assertion hit when fuzzing on-disk
i_mode values.

The easy case to fix is when changing an empty file
i_mode to S_IFDIR. In this case, xfs_dinode_verify()
detects an illegal zero size for directory and fails
to load the inode structure from disk.

For the case of non empty file whose i_mode is changed
to S_IFDIR, the ASSERT() statement in xfs_dir2_isblock()
is replaced with return -EFSCORRUPTED, to avoid interacting
with corrupted jusk also when XFS_DEBUG is disabled.
Suggested-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

3c6f46ea

xfs: make the ASSERT() condition likely · bf46ecc3

由 Amir Goldstein 提交于 1月 17, 2017

The ASSERT() condition is the normal case, not the exception,
so testing the condition should be likely(), not unlikely().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

bf46ecc3

12 1月, 2017 1 次提交

xfs: Timely free truncated dirty pages · 0a417b8d

由 Jan Kara 提交于 1月 11, 2017

Commit 99579cce "xfs: skip dirty pages in ->releasepage()" started
to skip dirty pages in xfs_vm_releasepage() which also has the effect
that if a dirty page is truncated, it does not get freed by
block_invalidatepage() and is lingering in LRU list waiting for reclaim.
So a simple loop like:

while true; do
	dd if=/dev/zero of=file bs=1M count=100
	rm file
done

will keep using more and more memory until we hit low watermarks and
start pagecache reclaim which will eventually reclaim also the truncate
pages. Keeping these truncated (and thus never usable) pages in memory
is just a waste of memory, is unnecessarily stressing page cache
reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
prematurely.

So instead of just skipping dirty pages in xfs_vm_releasepage(), return
to old behavior of skipping them only if they have delalloc or unwritten
buffers and fix the spurious warnings by warning only if the page is
clean.

CC: stable@vger.kernel.org
CC: Brian Foster <bfoster@redhat.com>
CC: Vlastimil Babka <vbabka@suse.cz>
Reported-by: NPetr Tůma <petr.tuma@d3s.mff.cuni.cz>
Fixes: 99579cceSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

0a417b8d

10 1月, 2017 5 次提交

xfs: don't print warnings when xfs_log_force fails · 84a4620c

由 Christoph Hellwig 提交于 1月 09, 2017

There are only two reasons for xfs_log_force / xfs_log_force_lsn to fail:
one is an I/O error, for which xlog_bdstrat already logs a warning, and
the second is an already shutdown log due to a previous I/O errors.  In
the latter case we'll already have a previous indication for the actual
error, but the large stream of misleading warnings from xfs_log_force
will probably scroll it out of the message buffer.

Simply removing the warnings thus makes the XFS log reporting significantly
better.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

84a4620c

xfs: don't rely on ->total in xfs_alloc_space_available · 12ef8301

由 Christoph Hellwig 提交于 1月 09, 2017

->total is a bit of an odd parameter passed down to the low-level
allocator all the way from the high-level callers.  It's supposed to
contain the maximum number of blocks to be allocated for the whole
transaction [1].

But in xfs_iomap_write_allocate we only convert existing delayed
allocations and thus only have a minimal block reservation for the
current transaction, so xfs_alloc_space_available can't use it for
the allocation decisions.  Use the maximum of args->total and the
calculated block requirement to make a decision.  We probably should
get rid of args->total eventually and instead apply ->minleft more
broadly, but that will require some extensive changes all over.

[1] which creates lots of confusion as most callers don't decrement it
once doing a first allocation.  But that's for a separate series.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

12ef8301

xfs: adjust allocation length in xfs_alloc_space_available · 54fee133

由 Christoph Hellwig 提交于 1月 09, 2017

We must decide in xfs_alloc_fix_freelist if we can perform an
allocation from a given AG is possible or not based on the available
space, and should not fail the allocation past that point on a
healthy file system.

But currently we have two additional places that second-guess
xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
maxlen parameter to remove the reservation before doing the
allocation (but ignores the various minium freespace requirements),
and xfs_alloc_fix_minleft tries to fix up the allocated length
after we've found an extent, but ignores the reservations and also
doesn't take the AGFL into account (and thus fails allocations
for not matching minlen in some cases).

Remove all these later fixups and just correct the maxlen argument
inside xfs_alloc_fix_freelist once we have the AGF buffer locked.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

54fee133

xfs: fix bogus minleft manipulations · 255c5162

由 Christoph Hellwig 提交于 1月 09, 2017

We can't just set minleft to 0 when we're low on space - that's exactly
what we need minleft for: to protect space in the AG for btree block
allocations when we are low on free space.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

255c5162

xfs: bump up reserved blocks in xfs_alloc_set_aside · 5149fd32

由 Christoph Hellwig 提交于 1月 09, 2017

Setting aside 4 blocks globally for bmbt splits isn't all that useful,
as different threads can allocate space in parallel.  Bump it to 4
blocks per AG to allow each thread that is currently doing an
allocation to dip into it separately.  Without that we may no have
enough reserved blocks if there are enough parallel transactions
in an almost out space file system that all run into bmap btree
splits.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

5149fd32

04 1月, 2017 4 次提交

xfs: fix max_retries _show and _store functions · ff97f239

由 Carlos Maiolino 提交于 1月 03, 2017

max_retries _show and _store functions should test against cfg->max_retries,
not cfg->retry_timeout
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ff97f239

xfs: fix crash and data corruption due to removal of busy COW extents · a1b7a4de

由 Christoph Hellwig 提交于 1月 03, 2017

There is a race window between write_cache_pages calling
clear_page_dirty_for_io and XFS calling set_page_writeback, in which
the mapping for an inode is tagged neither as dirty, nor as writeback.

If the COW shrinker hits in exactly that window we'll remove the delayed
COW extents and writepages trying to write it back, which in release
kernels will manifest as corruption of the bmap btree, and in debug
kernels will trip the ASSERT about now calling xfs_bmapi_write with the
COWFORK flag for holes. A complex customer load manages to hit this
window fairly reliably, probably by always having COW writeback in flight
while the cow shrinker runs.

This patch adds another check for having the I_DIRTY_PAGES flag set,
which is still set during this race window. While this fixes the problem
I'm still not overly happy about the way the COW shrinker works as it
still seems a bit fragile.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a1b7a4de

xfs: use the actual AG length when reserving blocks · 20e73b00

由 Darrick J. Wong 提交于 1月 03, 2017

We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.
Complained-about-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

20e73b00

xfs: fix double-cleanup when CUI recovery fails · 7a21272b

由 Darrick J. Wong 提交于 1月 03, 2017

Dan Carpenter reported a double-free of rcur if _defer_finish fails
while we're recovering CUI items.  Fix the error recovery to prevent
this.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

7a21272b

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

23 12月, 2016 1 次提交

vfs: fix isize/pos/len checks for reflink & dedupe · 22725ce4

由 Darrick J. Wong 提交于 12月 19, 2016

Strengthen the checking of pos/len vs. i_size, clarify the return values
for the clone prep function, and remove pointless code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22725ce4

10 12月, 2016 2 次提交

vfs: refactor clone/dedupe_file_range common functions · 876bec6f

由 Darrick J. Wong 提交于 12月 09, 2016

Hoist both the XFS reflink inode state and preparation code and the XFS
file blocks compare functions into the VFS so that ocfs2 can take
advantage of it for reflink and dedupe.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

876bec6f

fs: try to clone files first in vfs_copy_file_range · a76b5b04

由 Christoph Hellwig 提交于 12月 09, 2016

A clone is a perfectly fine implementation of a file copy, so most
file systems just implement the copy that way. Instead of duplicating
this logic move it to the VFS. Currently btrfs and XFS implement copies
the same way as clones and there is no behavior change for them, cifs
only implements clones and grow support for copy_file_range with this
patch. NFS implements both, so this will allow copy_file_range to work
on servers that only implement CLONE and be lot more efficient on servers
that implements CLONE and COPY.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a76b5b04

09 12月, 2016 3 次提交

vfs: remove ".readlink = generic_readlink" assignments · dfeef688

由 Miklos Szeredi 提交于 12月 09, 2016

If .readlink == NULL implies generic_readlink().

Generated by:

to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

dfeef688

vfs: replace calling i_op->readlink with vfs_readlink() · fd4a0edf

由 Miklos Szeredi 提交于 12月 09, 2016

Also check d_is_symlink() in callers instead of inode->i_op->readlink
because following patches will allow NULL ->readlink for symlinks.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fd4a0edf

xfs: nuke unused tracepoint definitions · 9875258c

由 Eric Sandeen 提交于 12月 09, 2016

This is all unused code, so remove it.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

9875258c

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功