提交 · 9902af79c01a8e39bb99b922fa3eef6d4ea23d69 · openeuler / Kernel

03 5月, 2016 17 次提交

parallel lookups: actual switch to rwsem · 9902af79

由 Al Viro 提交于 4月 15, 2016

ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9902af79

parallel lookups machinery, part 4 (and last) · d9171b93

由 Al Viro 提交于 4月 15, 2016

If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9171b93

parallel lookups machinery, part 3 · 94bdd655

由 Al Viro 提交于 4月 15, 2016

We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94bdd655

parallel lookups machinery, part 2 · 84e710da

由 Al Viro 提交于 4月 15, 2016

We'll need to verify that there's neither a hashed nor in-lookup
dentry with desired parent/name before adding to in-lookup set.

One possible solution would be to hold the parent's ->d_lock through
both checks, but while the in-lookup set is relatively small at any
time, dcache is not.  And holding the parent's ->d_lock through
something like __d_lookup_rcu() would suck too badly.

So we leave the parent's ->d_lock alone, which means that we watch
out for the following scenario:
	* we verify that there's no hashed match
	* existing in-lookup match gets hashed by another process
	* we verify that there's no in-lookup matches and decide
that everything's fine.

Solution: per-directory kinda-sorta seqlock, bumped around the times
we hash something that used to be in-lookup or move (and hash)
something in place of in-lookup.  Then the above would turn into
	* read the counter
	* do dcache lookup
	* if no matches found, check for in-lookup matches
	* if there had been none of those either, check if the
counter has changed; repeat if it has.

The "kinda-sorta" part is due to the fact that we don't have much spare
space in inode.  There is a spare word (shared with i_bdev/i_cdev/i_pipe),
so the counter part is not a problem, but spinlock is a different story.

We could use the parent's ->d_lock, and it would be less painful in
terms of contention, for __d_add() it would be rather inconvenient to
grab; we could do that (using lock_parent()), but...

Fortunately, we can get serialization on the counter itself, and it
might be a good idea in general; we can use cmpxchg() in a loop to
get from even to odd and smp_store_release() from odd to even.

This commit adds the counter and updating logics; the readers will be
added in the next commit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

84e710da

beginning of transition to parallel lookups - marking in-lookup dentries · 85c7f810

由 Al Viro 提交于 4月 14, 2016

marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85c7f810

A
__d_add(): don't drop/regain ->d_lock · 0568d705
由 Al Viro 提交于 4月 14, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0568d705
A
lookup_slow(): bugger off on IS_DEADDIR() from the very beginning · 1936386e
由 Al Viro 提交于 4月 14, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1936386e

nfs: missing wakeup in nfs_unblock_sillyrename() · d2caaa0a

由 Al Viro 提交于 4月 30, 2016

will be needed as soon as lookups are not serialized by ->i_mutex
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d2caaa0a

make ext2_get_page() and friends work without external serialization · be5b82db

由 Al Viro 提交于 4月 22, 2016

Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be5b82db

A
ovl_lookup_real(): use lookup_one_len_unlocked() · b9e1d435
由 Al Viro 提交于 4月 14, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b9e1d435

reconnect_one(): use lookup_one_len_unlocked() · 383d4e8a

由 Al Viro 提交于 4月 14, 2016

... and explain the non-obvious logics in case when lookup yields
a different dentry.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

383d4e8a

A
reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack() · 1ae1f3f6
由 Al Viro 提交于 4月 15, 2016
```
... and have it use inode_lock()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1ae1f3f6
A
orangefs: don't open-code inode_lock/inode_unlock · 5ecfcb26
由 Al Viro 提交于 4月 12, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5ecfcb26
A
ocfs2: don't open-code inode_lock/inode_unlock · 7b9743eb
由 Al Viro 提交于 4月 12, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7b9743eb

configfs_detach_prep(): make sure that wait_mutex won't go away · 48f35b7b

由 Al Viro 提交于 4月 12, 2016

grab a reference to dentry we'd got the sucker from, and return
that dentry via *wait, rather than just returning the address of
->i_mutex.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

48f35b7b

A
kernfs: use lookup_one_len_unlocked() · 779b8391
由 Al Viro 提交于 4月 11, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
779b8391
A
security_d_instantiate(): move to the point prior to attaching dentry to inode · b9680917
由 Al Viro 提交于 4月 11, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b9680917

11 4月, 2016 6 次提交

A
->getxattr(): pass dentry and inode as separate arguments · ce23e640
由 Al Viro 提交于 4月 11, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ce23e640
A
xattr_handler: pass dentry and inode as separate arguments of ->get() · b296821a
由 Al Viro 提交于 4月 10, 2016
```
... and do not assume they are already attached to each other
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b296821a

Revert "ext4: allow readdir()'s of large empty directories to be interrupted" · 9f2394c9

由 Linus Torvalds 提交于 4月 10, 2016

This reverts commit 1028b55b.

It's broken: it makes ext4 return an error at an invalid point, causing
the readdir wrappers to write the the position of the last successful
directory entry into the position field, which means that the next
readdir will now return that last successful entry _again_.

You can only return fatal errors (that terminate the readdir directory
walk) from within the filesystem readdir functions, the "normal" errors
(that happen when the readdir buffer fills up, for example) happen in
the iterorator where we know the position of the actual failing entry.

I do have a very different patch that does the "signal_pending()"
handling inside the iterator function where it is allowable, but while
that one passes all the sanity checks, I screwed up something like four
times while emailing it out, so I'm not going to commit it today.

So my track record is not good enough, and the stars will have to align
better before that one gets committed.  And it would be good to get some
review too, of course, since celestial alignments are always an iffy
debugging model.

IOW, let's just revert the commit that caused the problem for now.
Reported-by: NGreg Thelen <gthelen@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f2394c9

reiserfs: switch to generic_{get,set,remove}xattr() · 79a628d1

由 Al Viro 提交于 4月 10, 2016

reiserfs_xattr_[sg]et() will fail with -EOPNOTSUPP for V1 inodes anyway,
and all reiserfs instances of ->[sg]et() call it and so does ->set_acl().

Checks for name length in the instances had been bogus; they should've
been "bugger off if it's _exactly_ the prefix" (as generic would
do on its own) and not "bugger off if it's shorter than the prefix" -
that can't happen.

xattr_full_name() is needed to adjust for the fact that generic instances
will skip the prefix in the name passed to ->[gs]et(); reiserfs homegrown
analogues didn't.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

79a628d1

cifs: kill more bogus checks in ->...xattr() methods · 5fdccfef

由 Al Viro 提交于 4月 10, 2016

none of that stuff can ever be called for NULL or negative
dentry.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5fdccfef

A
don't bother with ->d_inode->i_sb - it's always equal to ->d_sb · fc64005c
由 Al Viro 提交于 4月 10, 2016
```
... and neither can ever be NULL
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fc64005c

09 4月, 2016 7 次提交

orangefs: remove unused variable · e56f4981

由 Martin Brandenburg 提交于 4月 04, 2016

Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

e56f4981

orangefs: Add KERN_<LEVEL> to gossip_<level> macros · 1917a693

由 Joe Perches 提交于 3月 27, 2016

Emit the logging messages at the appropriate levels.

Miscellanea:

o Change format to fmt
o Use the more common ##__VA_ARGS__
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

1917a693

orangefs: strncpy -> strscpy · 2eacea74

由 Martin Brandenburg 提交于 4月 08, 2016

It would have been possible for a rogue client-core to send in a symlink
target which is not NUL terminated. This returns EIO if the client-core
gives us corrupt data.

Leave debugfs and superblock code as is for now.

Other dcache.c and namei.c strncpy instances are safe because
ORANGEFS_NAME_MAX = NAME_MAX + 1; there is always enough space for a
name plus a NUL byte.
Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

2eacea74

orangefs: clean up truncate ctime and mtime setting · f83140c1

由 Martin Brandenburg 提交于 4月 04, 2016

The ctime and mtime are always updated on a successful ftruncate and
only updated on a successful truncate where the size changed.

We handle the ``if the size changed'' bit.

This matches FUSE's behavior.
Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

f83140c1

Orangefs: fix ifnullfree.cocci warnings · 2fa37fd7

由 kbuild test robot 提交于 3月 27, 2016

fs/orangefs/orangefs-debugfs.c:130:2-26: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

NULL check before some freeing functions is not needed.

Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

2fa37fd7

Orangefs: optimize boilerplate code. · a9bb3ba8

由 Mike Marshall 提交于 4月 06, 2016

Suggested by David Binderman <dcb314@hotmail.com>
The former can potentially be a performance win over the latter.

memcpy(d, s, len);
memset(d+len, c, size-len);

memset(d, c, size);
memcpy(d, s, len);
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

a9bb3ba8

Orangefs: xattr.c cleanup · 2d09a2ca

由 Mike Marshall 提交于 4月 06, 2016

1. It is nonsense to test for negative size_t, suggested by
   David Binderman <dcb314@hotmail.com>

2. By the time Orangefs gets called, the vfs has ensured that
   name != NULL, and that buffer and size are sane.
Signed-off-by: NMike Marshall <hubcap@omnibond.com>

2d09a2ca

07 4月, 2016 1 次提交

Btrfs: fix file/data loss caused by fsync after rename and new inode · 56f23fdb

由 Filipe Manana 提交于 3月 30, 2016

If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.

Example scenarios where this happens:
This is reproducible with the following steps, taken from a couple of
test cases written for fstests which are going to be submitted upstream
soon:

   # Scenario 1

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir -p /mnt/a/x
   echo "hello" > /mnt/a/x/foo
   echo "world" > /mnt/a/x/bar
   sync
   mv /mnt/a/x /mnt/a/y
   mkdir /mnt/a/x
   xfs_io -c fsync /mnt/a/x
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and
   the directory "y" does not exist nor do the files "foo" and
   "bar" exist anywhere (neither in "y" nor in "x", nor the root
   nor anywhere).

   # Scenario 2

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/a
   echo "hello" > /mnt/a/foo
   sync
   mv /mnt/a/foo /mnt/a/bar
   echo "world" > /mnt/a/foo
   xfs_io -c fsync /mnt/a/foo
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and the
   file "bar" does not exists anymore. A file with the name "foo"
   exists and it matches the second file we created.

Another related problem that does not involve file/data loss is when a
new inode is created with the name of a deleted snapshot and we fsync it:

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/testdir
   btrfs subvolume snapshot /mnt /mnt/testdir/snap
   btrfs subvolume delete /mnt/testdir/snap
   rmdir /mnt/testdir
   mkdir /mnt/testdir
   xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir
   <power failure>

   The next time the fs is mounted the log replay procedure fails because
   it attempts to delete the snapshot entry (which has dir item key type
   of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
   resulting in the following error that causes mount to fail:

   [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
   [52174.512570] ------------[ cut here ]------------
   [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
   [52174.514681] BTRFS: Transaction aborted (error -2)
   [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
   [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G        W       4.5.0-rc6-btrfs-next-27+ #1
   [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
   [52174.524053]  0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
   [52174.524053]  0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
   [52174.524053]  00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
   [52174.524053] Call Trace:
   [52174.524053]  [<ffffffff81264e93>] dump_stack+0x67/0x90
   [52174.524053]  [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2
   [52174.524053]  [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50
   [52174.524053]  [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff8118f5e9>] ? iput+0xb0/0x284
   [52174.524053]  [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
   [52174.524053]  [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs]
   [52174.524053]  [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs]
   [52174.524053]  [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs]
   [52174.524053]  [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs]
   [52174.524053]  [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs]
   [52174.524053]  [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs]
   [52174.524053]  [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs]
   [52174.524053]  [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffff8119590f>] do_mount+0x8a6/0x9e8
   [52174.524053]  [<ffffffff811358dd>] ? strndup_user+0x3f/0x59
   [52174.524053]  [<ffffffff81195c65>] SyS_mount+0x77/0x9f
   [52174.524053]  [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b
   [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

Fix this by forcing a transaction commit when such cases happen.
This means we check in the commit root of the subvolume tree if there
was any other inode with the same reference when the inode we are
fsync'ing is a new inode (created in the current transaction).

Test cases for fstests, covering all the scenarios given above, were
submitted upstream for fstests:

  * fstests: generic test for fsync after renaming directory
    https://patchwork.kernel.org/patch/8694281/

  * fstests: generic test for fsync after renaming file
    https://patchwork.kernel.org/patch/8694301/

  * fstests: add btrfs test for fsync after snapshot deletion
    https://patchwork.kernel.org/patch/8670671/

Cc: stable@vger.kernel.org
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

56f23fdb

05 4月, 2016 2 次提交

mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0

由 Kirill A. Shutemov 提交于 4月 01, 2016

Mostly direct substitution with occasional adjustment or removing
outdated comments.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea1754a0

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

04 4月, 2016 7 次提交

btrfs: Reset IO error counters before start of device replacing · 7ccefb98

由 Yauhen Kharuzhy 提交于 3月 29, 2016

If device replace entry was found on disk at mounting and its num_write_errors
stats counter has non-NULL value, then replace operation will never be
finished and -EIO error will be reported by btrfs_scrub_dev() because
this counter is never reset.

 # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
 # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
 Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs
 # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
 ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error

Reset num_write_errors and num_uncorrectable_read_errors counters in the
dev_replace structure before start of replacing.
Signed-off-by: NYauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7ccefb98

btrfs: Add qgroup tracing · 0f5dcf8d

由 Mark Fasheh 提交于 3月 29, 2016

This patch adds tracepoints to the qgroup code on both the reporting side
(insert_dirty_extents) and the accounting side. Taken together it allows us
to see what qgroup operations have happened, and what their result was.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0f5dcf8d

Btrfs: don't use src fd for printk · c79b4713

由 Josef Bacik 提交于 3月 25, 2016

The fd we pass in may not be on a btrfs file system, so don't try to do
BTRFS_I() on it.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c79b4713

btrfs: fallback to vmalloc in btrfs_compare_tree · 8f282f71

由 David Sterba 提交于 3月 30, 2016

The allocation of node could fail if the memory is too fragmented for a
given node size, practically observed with 64k.

http://article.gmane.org/gmane.comp.file-systems.btrfs/54689Reported-and-tested-by: NJean-Denis Girard <jd.girard@sysnux.pf>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8f282f71

btrfs: handle non-fatal errors in btrfs_qgroup_inherit() · 918c2ee1

由 Mark Fasheh 提交于 3月 30, 2016

create_pending_snapshot() will go readonly on _any_ error return from
btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by
just making a snapshot and asking it to inherit from an invalid qgroup. For
example:

$ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo

Will cause a transaction abort.

Fix this by only throwing errors in btrfs_qgroup_inherit() when we know
going readonly is acceptable.

The following xfstests test case reproduces this bug:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
  	cd /
  	rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # remove previous $seqres.full before test
  rm -f $seqres.full

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch

  rm -f $seqres.full

  _scratch_mkfs
  _scratch_mount
  _run_btrfs_util_prog quota enable $SCRATCH_MNT
  # The qgroup '1/10' does not exist and should be silently ignored
  _run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1

  _scratch_unmount

  echo "Silence is golden"

  status=0
  exit
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

918c2ee1

btrfs: Output more info for enospc_debug mount option · 0305bc27

由 Qu Wenruo 提交于 3月 23, 2016

As one user in mail list report reproducible balance ENOSPC error, it's
better to add more debug info for enospc_debug mount option.
Reported-by: NMarc Haber <mh+linux-btrfs@zugschlus.de>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0305bc27

Btrfs: fix invalid reference in replace_path · 264813ac

由 Liu Bo 提交于 3月 21, 2016

Dan Carpenter's static checker has found this error, it's introduced by
commit 64c043de
("Btrfs: fix up read_tree_block to return proper error")

It's really supposed to 'break' the loop on error like others.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

264813ac

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功