提交 · 5706278758e334bf6a15f57c18dc16df19c83957 · openeuler / Kernel

24 10月, 2013 2 次提交

xfs: unify directory/attribute format definitions · 57062787

由 Dave Chinner 提交于 10月 15, 2013

The on-disk format definitions for the directory and attribute
structures are spread across 3 header files right now, only one of
which is dedicated to defining on-disk structures and their
manipulation (xfs_dir2_format.h). Pull all the format definitions
into a single header file - xfs_da_format.h - and switch all the
code over to point at that.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

57062787

xfs: create a shared header file for format-related information · 70a9883c

由 Dave Chinner 提交于 10月 23, 2013

All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

70a9883c

22 10月, 2013 5 次提交

xfs: fold xfs_change_file_space into xfs_ioc_space · 865e9446

由 Christoph Hellwig 提交于 10月 12, 2013

Now that only one caller of xfs_change_file_space is left it can be merged
into said caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

865e9446

xfs: simplify the fallocate path · 83aee9e4

由 Christoph Hellwig 提交于 10月 12, 2013

Call xfs_alloc_file_space or xfs_free_file_space directly from
xfs_file_fallocate instead of going through xfs_change_file_space.

This simplified the code by removing the unessecary marshalling of the
arguments into an xfs_flock64_t structure and allows removing checks that
are already done in the VFS code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

83aee9e4

xfs: always hold the iolock when calling xfs_change_file_space · 5f8aca8b

由 Christoph Hellwig 提交于 10月 12, 2013

Currently fallocate always holds the iolock when calling into
xfs_change_file_space, while the ioctl path lets some of the lower level
functions take it, but leave it out in others.

This patch makes sure the ioctl path also always holds the iolock and
thus introduces consistent locking for the preallocation operations while
simplifying the code and allowing to kill the now unused XFS_ATTR_NOLOCK
flag.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

5f8aca8b

xfs: remove the unused XFS_ATTR_NONBLOCK flag · 001a3e73

由 Christoph Hellwig 提交于 10月 12, 2013

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

001a3e73

xfs: always take the iolock around xfs_setattr_size · 76ca4c23

由 Christoph Hellwig 提交于 10月 14, 2013

There is no reason to conditionally take the iolock inside xfs_setattr_size
when we can let the caller handle it unconditionally, which just incrases
the lock hold time for the case where it was previously taken internally
by a few instructions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

76ca4c23

18 10月, 2013 3 次提交

xfs: don't break from growfs ag update loop on error · 59e5a0e8

由 Eric Sandeen 提交于 10月 11, 2013

When xfs_growfs_data_private() is updating backup superblocks,
it bails out on the first error encountered, whether reading or
writing:

* If we get an error writing out the alternate superblocks,
* just issue a warning and continue.  The real work is
* already done and committed.

This can cause a problem later during repair, because repair
looks at all superblocks, and picks the most prevalent one
as correct.  If we bail out early in the backup superblock
loop, we can end up with more "bad" matching superblocks than
good, and a post-growfs repair may revert the filesystem to
the old geometry.

With the combination of superblock verifiers and old bugs,
we're more likely to encounter read errors due to verification.

And perhaps even worse, we don't even properly write any of the
newly-added superblocks in the new AGs.

Even with this change, growfs will still say:

  xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
  data blocks changed from 319815680 to 335216640

which might be confusing to the user, but it at least communicates
that something has gone wrong, and dmesg will probably highlight
the need for an xfs_repair.

And this is still best-effort; if verifiers fail on more than
half the backup supers, they may still "win" - but that's probably
best left to repair to more gracefully handle by doing its own
strict verification as part of the backup super "voting."
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
Signed-off-by: NBen Myers <bpm@sgi.com>

59e5a0e8

xfs: don't emit corruption noise on fs probes · 31625f28

由 Eric Sandeen 提交于 10月 11, 2013

If we get EWRONGFS due to probing of non-xfs filesystems,
there's no need to issue the scary corruption error and backtrace.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

31625f28

xfs: remove newlines from strings passed to __xfs_printk · 08e96e1a

由 Eric Sandeen 提交于 10月 11, 2013

__xfs_printk adds its own "\n".  Having it in the original string
leads to unintentional blank lines from these messages.

Most format strings have no newline, but a few do, leading to
i.e.:

[ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119911] 
[ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119919] 

Fix them all.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

08e96e1a

17 10月, 2013 1 次提交

xfs: prevent deadlock trying to cover an active log · 2c6e24ce

由 Dave Chinner 提交于 10月 15, 2013

Recent analysis of a deadlocked XFS filesystem from a kernel
crash dump indicated that the filesystem was stuck waiting for log
space. The short story of the hang on the RHEL6 kernel is this:

	- the tail of the log is pinned by an inode
	- the inode has been pushed by the xfsaild
	- the inode has been flushed to it's backing buffer and is
	  currently flush locked and hence waiting for backing
	  buffer IO to complete and remove it from the AIL
	- the backing buffer is marked for write - it is on the
	  delayed write queue
	- the inode buffer has been modified directly and logged
	  recently due to unlinked inode list modification
	- the backing buffer is pinned in memory as it is in the
	  active CIL context.
	- the xfsbufd won't start buffer writeback because it is
	  pinned
	- xfssyncd won't force the log because it sees the log as
	  needing to be covered and hence wants to issue a dummy
	  transaction to move the log covering state machine along.

Hence there is no trigger to force the CIL to the log and hence
unpin the inode buffer and therefore complete the inode IO, remove
it from the AIL and hence move the tail of the log along, allowing
transactions to start again.

Mainline kernels also have the same deadlock, though the signature
is slightly different - the inode buffer never reaches the delayed
write lists because xfs_buf_item_push() sees that it is pinned and
hence never adds it to the delayed write list that the xfsaild
flushes.

There are two possible solutions here. The first is to simply force
the log before trying to cover the log and so ensure that the CIL is
emptied before we try to reserve space for the dummy transaction in
the xfs_log_worker(). While this might work most of the time, it is
still racy and is no guarantee that we don't get stuck in
xfs_trans_reserve waiting for log space to come free. Hence it's not
the best way to solve the problem.

The second solution is to modify xfs_log_need_covered() to be aware
of the CIL. We only should be attempting to cover the log if there
is no current activity in the log - covering the log is the process
of ensuring that the head and tail in the log on disk are identical
(i.e. the log is clean and at idle). Hence, by definition, if there
are items in the CIL then the log is not at idle and so we don't
need to attempt to cover it.

When we don't need to cover the log because it is active or idle, we
issue a log force from xfs_log_worker() - if the log is idle, then
this does nothing.  However, if the log is active due to there being
items in the CIL, it will force the items in the CIL to the log and
unpin them.

In the case of the above deadlock scenario, instead of
xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
cover the log, it will instead force the log, thereby unpinning the
inode buffer, allowing IO to be issued and complete and hence
removing the inode that was pinning the tail of the log from the
AIL. At that point, everything will start moving along again. i.e.
the xfs_log_worker turns back into a watchdog that can alleviate
deadlocks based around pinned items that prevent the tail of the log
from being moved...
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

2c6e24ce

09 10月, 2013 5 次提交

xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE · 74564fb4

由 Brian Foster 提交于 9月 20, 2013

The xfs_inactive() return value is meaningless. Turn xfs_inactive()
into a void function and clean up the error handling appropriately.
Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant
to Linux.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

74564fb4

xfs: push down inactive transaction mgmt for ifree · 88877d2b

由 Brian Foster 提交于 9月 20, 2013

Push the inode free work performed during xfs_inactive() down into
a new xfs_inactive_ifree() helper. This clears xfs_inactive() from
all inode locking and transaction management more directly
associated with freeing the inode xattrs, extents and the inode
itself.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

88877d2b

xfs: push down inactive transaction mgmt for truncate · f7be2d7f

由 Brian Foster 提交于 9月 20, 2013

Create the new xfs_inactive_truncate() function to handle the
truncate portion of xfs_inactive(). Push the locking and
transaction management into the new function.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

f7be2d7f

xfs: push down inactive transaction mgmt for remote symlinks · 36b21dde

由 Brian Foster 提交于 9月 20, 2013

Push down the transaction management for remote symlinks from
xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is
cleaned up to avoid transaction management intended for the
calling context (i.e., trans duplication, reservation, item
attachment).
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

36b21dde

xfs: add the inode directory type support to XFS_IOC_FSGEOM · 2900a579

由 Mark Tinguely 提交于 10月 02, 2013

Add the inode type directory type support to XFS_IOC_FSGEOM
so that xfs_repair/xfs_info knows if the superblock v4 filesystem
enabled the feature.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

2900a579

02 10月, 2013 3 次提交

xfs: remove usage of is_bad_inode · d948709b

由 Ben Myers 提交于 9月 10, 2013

XFS never calls mark_inode_bad or iget_failed, so it will never see a
bad inode.  Remove all checks for is_bad_inode because they are
unnecessary.
Signed-off-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

d948709b

xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() · 17ec81c1

由 Jie Liu 提交于 9月 22, 2013

At xfs_iext_realloc_direct(), the new_size is changed by adding
if_bytes if originally the extent records are stored at the inline
extent buffer, and we have to switch from it to a direct extent
list for those new allocated extents, this is wrong. e.g,

Create a file with three extents which was showing as following,

xfs_io -f -c "truncate 100m" /xfs/testme

for i in $(seq 0 5 10); do
	offset=$(($i * $((1 << 20))))
	xfs_io -c "pwrite $offset 1m" /xfs/testme
done

Inline
------
irec:	if_bytes	bytes_diff	new_size
1st	0		16		16
2nd	16		16		32

Switching
---------						rnew_size
3rd	32		16		48 + 32 = 80	roundup=128

In this case, the desired value of new_size should be 48, and then
it will be roundup to 64 and be assigned to rnew_size.

However, this issue has been covered by resetting the if_bytes to
the new_size which is calculated at the begnning of xfs_iext_add()
before leaving out this function, and in turn make the rnew_size
correctly again. Hence, this can not be detected via xfstestes.

This patch fix above problem and revise the new_size comments at
xfs_iext_realloc_direct() to make it more readable.  Also, fix the
comments while switching from the inline extent buffer to a direct
extent list to reflect this change.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

17ec81c1

xfs: get rid of count from xfs_iomap_write_allocate() · 0799a3e8

由 Jie Liu 提交于 9月 29, 2013

Get rid of function variable count from xfs_iomap_write_allocate() as
it is unused.

Additionally, checkpatch warn me of the following for this change:
WARNING: extern prototypes should be avoided in .h files
+extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,

So this patch also remove all extern function prototypes at xfs_iomap.h
to suppress it to make this code style in consistent manner in this file.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

0799a3e8

01 10月, 2013 4 次提交

xfs: Use kmem_free() instead of free() · aaaae980

由 Thierry Reding 提交于 10月 01, 2013

This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

aaaae980

xfs: fix memory leak in xlog_recover_add_to_trans · 519ccb81

由 tinguely@sgi.com 提交于 9月 27, 2013

Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

519ccb81

xfs: dirent dtype presence is dependent on directory magic numbers · 367993e7

由 Dave Chinner 提交于 9月 30, 2013

The determination of whether a directory entry contains a dtype
field originally was dependent on the filesystem having CRCs
enabled. This meant that the format for dtype beign enabled could be
determined by checking the directory block magic number rather than
doing a feature bit check. This was useful in that it meant that we
didn't need to pass a struct xfs_mount around to functions that
were already supplied with a directory block header.

Unfortunately, the introduction of dtype fields into the v4
structure via a feature bit meant this "use the directory block
magic number" method of discriminating the dirent entry sizes is
broken. Hence we need to convert the places that use magic number
checks to use feature bit checks so that they work correctly and not
by chance.

The current code works on v4 filesystems only because the dirent
size roundup covers the extra byte needed by the dtype field in the
places where this problem occurs.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

367993e7

xfs: lockdep needs to know about 3 dquot-deep nesting · f112a049

由 Dave Chinner 提交于 9月 30, 2013

Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.
Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

f112a049

26 9月, 2013 1 次提交

xfs: fix node forward in xfs_node_toosmall · 997def25

由 Mark Tinguely 提交于 9月 23, 2013

Commit f5ea1100 cleans up the disk to host conversions for
node directory entries, but because a variable is reused in
xfs_node_toosmall() the next node is not correctly found.
If the original node is small enough (<= 3/8 of the node size),
this change may incorrectly cause a node collapse when it should
not. That will cause an assert in xfstest generic/319:

   Assertion failed: first <= last && last < BBTOB(bp->b_length),
   file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569

Keep the original node header to get the correct forward node.

(When a node is considered for a merge with a sibling, it overwrites the
 sibling pointers of the original incore nodehdr with the sibling's
 pointers.  This leads to loop considering the original node as a merge
 candidate with itself in the second pass, and so it incorrectly
 determines a merge should occur.)
Signed-off-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

[v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
	cleaned up whitespace.  -bpm]

997def25

25 9月, 2013 4 次提交

xfs: log recovery lsn ordering needs uuid check · 566055d3

由 Dave Chinner 提交于 9月 24, 2013

After a fair number of xfstests runs, xfs/182 started to fail
regularly with a corrupted directory - a directory read verifier was
failing after recovery because it found a block with a XARM magic
number (remote attribute block) rather than a directory data block.

The first time I saw this repeated failure I did /something/ and the
problem went away, so I was never able to find the underlying
problem. Test xfs/182 failed again today, and I found the root
cause before I did /something else/ that made it go away.

Tracing indicated that the block in question was being correctly
logged, the log was being flushed by sync, but the buffer was not
being written back before the shutdown occurred. Tracing also
indicated that log recovery was also reading the block, but then
never writing it before log recovery invalidated the cache,
indicating that it was not modified by log recovery.

More detailed analysis of the corpse indicated that the filesystem
had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
indicated it was a block from an older filesystem. The reason that
log recovery didn't replay it was that the LSN in the XARM block was
larger than the LSN of the transaction being replayed, and so the
block was not overwritten by log recovery.

Hence, log recovery cant blindly trust the magic number and LSN in
the block - it must verify that it belongs to the filesystem being
recovered before using the LSN. i.e. if the UUIDs don't match, we
need to unconditionally recovery the change held in the log.

This patch was first tested on a block device that was repeatedly
causing xfs/182 to fail with the same failure on the same block with
the same directory read corruption signature (i.e. XARM block). It
did not fail, and hasn't failed since.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

566055d3

xfs: fix XFS_IOC_FREE_EOFBLOCKS definition · b771af2f

由 Dave Chinner 提交于 9月 24, 2013

It uses a kernel internal structure in it's definition rather than
the user visible structure that is passed to the ioctl.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

b771af2f

xfs: asserting lock not held during freeing not valid · b313a5f1

由 Dave Chinner 提交于 9月 24, 2013

When we free an inode, we do so via RCU. As an RCU lookup can occur
at any time before we free an inode, and that lookup takes the inode
flags lock, we cannot safely assert that the flags lock is not held
just before marking it dead and running call_rcu() to free the
inode.

We check on allocation of a new inode structre that the lock is not
held, so we still have protection against locks being leaked and
hence not correctly initialised when allocated out of the slab.
Hence just remove the assert...
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

b313a5f1

xfs: lock the AIL before removing the buffer item · 48852358

由 Dave Chinner 提交于 9月 24, 2013

Regression introduced by commit 46f9d2eb ("xfs: aborted buf items can
be in the AIL") which fails to lock the AIL before removing the
item. Spinlock debugging throws a warning about this.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

48852358

15 9月, 2013 1 次提交

vfs: fix typo in comment in recent dentry work · 05a8252b

由 Linus Torvalds 提交于 9月 15, 2013

Sedat points out that I transposed some letters in "LRU" and wrote "RLU"
instead in one of the new comments explaining the flow.  Let's just fix
it.
Reported-by: NSedat Dilek <sedat.dilek@jpberlin.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05a8252b

14 9月, 2013 3 次提交

vfs: fix dentry LRU list handling and nr_dentry_unused accounting · 89dc77bc

由 Linus Torvalds 提交于 9月 13, 2013

The LRU list changes interacted badly with our nr_dentry_unused
accounting, and even worse with the new DCACHE_LRU_LIST bit logic.

This introduces helper functions to make sure everything follows the
proper dcache d_lru list rules: the dentry cache is complicated by the
fact that some of the hotpaths don't even want to look at the LRU list
at all, and the fact that we use the same list entry in the dentry for
both the LRU list and for our temporary shrinking lists when removing
things from the LRU.

The helper functions temporarily have some extra sanity checking for the
flag bits that have to match the current LRU state of the dentry.  We'll
remove that before the final 3.12 release, but considering how easy it
is to get wrong, this first cleanup version has some very particular
sanity checking.
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89dc77bc

cifs: Avoid calling unlock_page() twice in cifs_readpage() when using fscache · 466bd31b

由 Sachin Prabhu 提交于 9月 13, 2013

When reading a single page with cifs_readpage(), we make a call to
fscache_read_or_alloc_page() which once done, asynchronously calls
the completion function cifs_readpage_from_fscache_complete(). This
completion function unlocks the page once it has been populated from
cache. The module then attempts to unlock the page a second time in
cifs_readpage() which leads to warning messages.

In case of a successful call to fscache_read_or_alloc_page() we should skip
the second unlock_page() since this will be called by the
cifs_readpage_from_fscache_complete() once the page has been populated by
fscache.

With the modifications to cifs_readpage_worker(), we will need to re-grab the
page lock in cifs_write_begin().

The problem was first noticed when testing new fscache patches for cifs.
https://bugzilla.redhat.com/show_bug.cgi?id=1005737Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

466bd31b

cifs: Do not take a reference to the page in cifs_readpage_worker() · a9e9b7bc

由 Sachin Prabhu 提交于 9月 13, 2013

We do not need to take a reference to the pagecache in
cifs_readpage_worker() since the calling function will have already
taken one before passing the pointer to the page as an argument to the
function.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

a9e9b7bc

13 9月, 2013 8 次提交

thp: account anon transparent huge pages into NR_ANON_PAGES · 3cd14fcd

由 Kirill A. Shutemov 提交于 9月 12, 2013

We use NR_ANON_PAGES as base for reporting AnonPages to user.  There's
not much sense in not accounting transparent huge pages there, but add
them on printing to user.

Let's account transparent huge pages in NR_ANON_PAGES in the first place.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Ning Qu <quning@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cd14fcd

truncate: drop 'oldsize' truncate_pagecache() parameter · 7caef267

由 Kirill A. Shutemov 提交于 9月 12, 2013

truncate_pagecache() doesn't care about old size since commit
cedabed4 ("vfs: Fix vmtruncate() regression").  Let's drop it.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7caef267

vfs: make d_path() get the root path under RCU · 68f0d9d9

由 Linus Torvalds 提交于 9月 12, 2013

This avoids the spinlocks and refcounts in the d_path() sequence too
(used by /proc and various other entities).  See commit 8b19e341 for
the equivalent getcwd() system call path.

And unlike getcwd(), d_path() doesn't copy the result to user space, so
I don't need to fear _that_ particular bug happening again.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68f0d9d9

vfs: use __getname/__putname for getcwd() system call · 3272c544

由 Linus Torvalds 提交于 9月 12, 2013

It's a pathname.  It should use the pathname allocators and
deallocators, and PATH_MAX instead of PAGE_SIZE.  Never mind that the
two are commonly the same.

With this, the allocations scale up nicely too, and I can do getcwd()
system calls at a rate of about 300M/s, with no lock contention
anywhere.

Of course, nobody sane does that, especially since getcwd() is
traditionally a very slow operation in Unix.  But this was also the
simplest way to benchmark the prepend_path() improvements by Waiman, and
once I saw the profiles I couldn't leave it well enough alone.

But apart from being an performance improvement (from using per-cpu slab
allocators instead of the raw page allocator), it's actually a valid and
real cleanup.
Signed-off-by: NLinus "OCD" Torvalds <torvalds@linux-foundation.org>

3272c544

vfs: don't copy things to user space holding the rcu readlock · ff812d72

由 Linus Torvalds 提交于 9月 12, 2013

Oops.  That wasn't very smart.  We don't actually need the RCU lock any
more by the time we copy the cwd string to user space, but I had
stupidly surrounded the whole thing with it.

Introduced by commit 8b19e341 ("vfs: make getcwd() get the root and
pwd path under rcu")

Is-a-big-hairy-idiot: Linus Torvalds <torvalds@linux-foundation.org>

ff812d72

vfs: make getcwd() get the root and pwd path under rcu · 8b19e341

由 Linus Torvalds 提交于 9月 12, 2013

This allows us to skip all the crazy spinlocks and reference count
updates, and instead use the fs sequence read-lock to get an atomic
snapshot of the root and cwd information.

We might want to make the rule that "prepend_path()" is always called
with the RCU lock held, but the RCU lock nests fine and this is the
minimal fix.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b19e341

vfs: move get_fs_root_and_pwd() to single caller · 5762482f

由 Linus Torvalds 提交于 9月 12, 2013

Let's not pollute the include files with inline functions that are only
used in a single place. Especially not if we decide we might want to
change the semantics of said function to make it more efficient..
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5762482f

dcache: get/release read lock in read_seqbegin_or_lock() & friend · 18129977

由 Waiman Long 提交于 9月 12, 2013

This patch modifies read_seqbegin_or_lock() and need_seqretry() to use
newly introduced read_seqlock_excl() and read_sequnlock_excl()
primitives so that they won't change the sequence number even if they
fall back to take the lock.  This is OK as no change to the protected
data structure is being made.

It will prevent one fallback to lock taking from cascading into a series
of lock taking reducing performance because of the sequence number
change.  It will also allow other sequence readers to go forward while
an exclusive reader lock is taken.

This patch also updates some of the inaccurate comments in the code.
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18129977

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功