提交 · c568d68341be7030f5647def68851e469b21ca11 · openanolis / cloud-kernel

16 9月, 2016 3 次提交

locks: fix file locking on overlayfs · c568d683

由 Miklos Szeredi 提交于 9月 16, 2016

This patch allows flock, posix locks, ofd locks and leases to work
correctly on overlayfs.

Instead of using the underlying inode for storing lock context use the
overlay inode.  This allows locks to be persistent across copy-up.

This is done by introducing locks_inode() helper and using it instead of
file_inode() to get the inode in locking code.  For non-overlayfs the two
are equivalent, except for an extra pointer dereference in locks_inode().

Since lock operations are in "struct file_operations" we must also make
sure not to call underlying filesystem's lock operations.  Introcude a
super block flag MS_NOREMOTELOCK to this effect.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Acked-by: NJeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>

c568d683

vfs: update ovl inode before relatime check · 598e3c8f

由 Miklos Szeredi 提交于 9月 16, 2016

On overlayfs relatime_need_update() needs inode times to be correct on
overlay inode. But i_mtime and i_ctime are updated by filesystem code on
underlying inode only, so they will be out-of-date on the overlay inode.

This patch copies the times from the underlying inode if needed. This
can't be done if called from RCU lookup (link following) but link m/ctime
are not updated by fs, so this is all right.

This patch doesn't change functionality for anything but overlayfs.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

598e3c8f

vfs: move permission checking into notify_change() for utimes(NULL) · f2b20f6e

由 Miklos Szeredi 提交于 9月 16, 2016

This fixes a bug where the permission was not properly checked in
overlayfs.  The testcase is ltp/utimensat01.

It is also cleaner and safer to do the permission checking in the vfs
helper instead of the caller.

This patch introduces an additional ia_valid flag ATTR_TOUCH (since
touch(1) is the most obvious user of utimes(NULL)) that is passed into
notify_change whenever the conditions for this special permission checking
mode are met.
Reported-by: NAihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Tested-by: NAihua Zhang <zhangaihua1@huawei.com>
Cc: <stable@vger.kernel.org> # v3.18+

f2b20f6e

10 9月, 2016 4 次提交

fscrypto: require write access to mount to set encryption policy · ba63f23d

由 Eric Biggers 提交于 9月 08, 2016

Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>

ba63f23d

fscrypto: only allow setting encryption policy on directories · 002ced4b

由 Eric Biggers 提交于 9月 08, 2016

The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files.  This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.

In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.

This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later.  It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.

This patch restores the !S_ISDIR() check that was present in older
kernels.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

002ced4b

fscrypto: add authorization check for setting encryption policy · 163ae1c6

由 Eric Biggers 提交于 9月 08, 2016

On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access.  This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy.  This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

163ae1c6

mm: fix show_smap() for zone_device-pmd ranges · ca120cf6

由 Dan Williams 提交于 9月 03, 2016

Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
currently results in the following VM_BUG_ONs:

 kernel BUG at mm/huge_memory.c:1105!
 task: ffff88045f16b140 task.stack: ffff88045be14000
 RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
 [..]
 Call Trace:
  [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
  [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307656>] show_smap+0xa6/0x2b0

 kernel BUG at fs/proc/task_mmu.c:585!
 RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
 Call Trace:
  [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307696>] show_smap+0xa6/0x2b0

These locations are sanity checking page flags that must be set for an
anonymous transparent huge page, but are not set for the zone_device
pages associated with dax mappings.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ca120cf6

06 9月, 2016 2 次提交

btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress · ce129655

由 Wang Xiaoguang 提交于 9月 02, 2016

In btrfs_async_reclaim_metadata_space(), we use ticket's address to
determine whether asynchronous metadata reclaim work is making progress.

	ticket = list_first_entry(&space_info->tickets,
				  struct reserve_ticket, list);
	if (last_ticket == ticket) {
		flush_state++;
	} else {
		last_ticket = ticket;
		flush_state = FLUSH_DELAYED_ITEMS_NR;
		if (commit_cycles)
			commit_cycles--;
	}

But indeed it's wrong, we should not rely on local variable's address to
do this check, because addresses may be same. In my test environment, I
dd one 168MB file in a 256MB fs, found that for this file, every time
wait_reserve_ticket() called, local variable ticket's address is same,

For above codes, assume a previous ticket's address is addrA, last_ticket
is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
wake up it, then another ticket is added, but with the same address addrA,
now last_ticket will be same to current ticket, then current ticket's flush
work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
which may result in some enospc issues(I have seen this in my test machine).
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ce129655

Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns · cbd60aa7

由 Chris Mason 提交于 9月 06, 2016

We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out.  It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433deb added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list.  This
patch makes sure we remove things before exiting.
Signed-off-by: NChris Mason <clm@fb.com>
Fixes: d1433deb
cc: stable@vger.kernel.org # 3.15+

cbd60aa7

05 9月, 2016 3 次提交

btrfs: do not decrease bytes_may_use when replaying extents · ed7a6948

由 Wang Xiaoguang 提交于 8月 26, 2016

When replaying extents, there is no need to update bytes_may_use
in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a
WARN_ON about bytes_may_use.

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ed7a6948

ceph: do not modify fi->frag in need_reset_readdir() · 0f5aa88a

由 Nicolas Iooss 提交于 8月 28, 2016

Commit f3c4ebe6 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
            } else if (fi->frag |= fpos_frag(new_pos)) {
                       ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^
                       (                             )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^~
                                !=

Fixes: f3c4ebe6 ("ceph: using hash value to compose dentry offset")
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0f5aa88a

ovl: fix workdir creation · e1ff3dd1

由 Miklos Szeredi 提交于 9月 05, 2016

Workdir creation fails in latest kernel.

Fix by allowing EOPNOTSUPP as a valid return value from
vfs_removexattr(XATTR_NAME_POSIX_ACL_*).  Upper filesystem may not support
ACL and still be perfectly able to support overlayfs.
Reported-by: NMartin Ziegler <ziegler@uni-freiburg.de>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Fixes: c11b9fdd ("ovl: remove posix_acl_default from workdir")
Cc: <stable@vger.kernel.org>

e1ff3dd1

04 9月, 2016 1 次提交

devpts: return NULL pts 'priv' entry for non-devpts nodes · 3e423945

由 Linus Torvalds 提交于 9月 03, 2016

In commit 8ead9dd5 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly.  And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.

And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.

So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org # 4.6+
Cc: Eric W Biederman" <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e423945

01 9月, 2016 17 次提交

btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket() · e0af2484

由 Wang Xiaoguang 提交于 8月 31, 2016

If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
return directly and also forget to wake up process which are waiting for
their tickets, so these processes will wait endlessly.

Fstests case generic/172 with mount option "-o compress=lzo" have revealed
this bug in my test machine. Here if we have tickets to handle, we must
handle them first.
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e0af2484

Btrfs: fix endless loop in balancing block groups · a9b1fc85

由 Liu Bo 提交于 8月 31, 2016

Qgroup function may overwrite the saved error 'err' with 0
in case quota is not enabled, and this ends up with a
endless loop in balance because we keep going back to balance
the same block group.

It really should use 'ret' instead.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a9b1fc85

Btrfs: kill invalid ASSERT() in process_all_refs() · 3dc09ec8

由 Josef Bacik 提交于 8月 24, 2016

Suppose you have the following tree in snap1 on a file system mounted with -o
inode_cache so that inode numbers are recycled

└── [    258]  a
    └── [    257]  b

and then you remove b, rename a to c, and then re-create b in c so you have the
following tree

└── [    258]  c
    └── [    257]  b

and then you try to do an incremental send you will hit

ASSERT(pending_move == 0);

in process_all_refs().  This is because we assume that any recycling of inodes
will not have a pending change in our path, which isn't the case.  This is the
case for the DELETE side, since we want to remove the old file using the old
path, but on the create side we could have a pending move and need to do the
normal pending rename dance.  So remove this ASSERT() and put a comment about
why we ignore pending_move.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3dc09ec8

ovl: listxattr: use strnlen() · 7cb35119

由 Miklos Szeredi 提交于 9月 01, 2016

Be defensive about what underlying fs provides us in the returned xattr
list buffer.  If it's not properly null terminated, bail out with a warning
insead of BUG.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>

7cb35119

ovl: Switch to generic_getxattr · 0eb45fc3

由 Andreas Gruenbacher 提交于 8月 22, 2016

Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
those same handlers for iop->getxattr as well.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0eb45fc3

ovl: copyattr after setting POSIX ACL · ce31513a

由 Miklos Szeredi 提交于 9月 01, 2016

Setting POSIX acl may also modify the file mode, so need to copy that up to
the overlay inode.
Reported-by: NEryu Guan <eguan@redhat.com>
Fixes: d837a49b ("ovl: fix POSIX ACL setting")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

ce31513a

ovl: Switch to generic_removexattr · 0e585ccc

由 Andreas Gruenbacher 提交于 8月 22, 2016

Commit d837a49b ("ovl: fix POSIX ACL setting") switches from
iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
ovl_removexattr to generic_removexattr as well.  As far as permission
checking goes, the same rules should apply in either case.

While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
this is not an iop->setxattr implementation and remove the unused inode
argument.

Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
order of handlers in ovl_xattr_handlers.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49b ("ovl: fix POSIX ACL setting")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0e585ccc

ovl: Get rid of ovl_xattr_noacl_handlers array · 0c97be22

由 Andreas Gruenbacher 提交于 8月 22, 2016

Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
in ovl_xattr_handlers, like the other filesystems do.  Flag the code
that is now only used conditionally with __maybe_unused.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0c97be22

ovl: Fix OVL_XATTR_PREFIX · fe2b7595

由 Andreas Gruenbacher 提交于 8月 22, 2016

Make sure ovl_own_xattr_handler only matches attribute names starting
with "overlay.", not "overlayXXX".
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49b ("ovl: fix POSIX ACL setting")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fe2b7595

ovl: fix spelling mistake: "directries" -> "directories" · fd36570a

由 Colin Ian King 提交于 8月 18, 2016

Trivial fix to spelling mistake in pr_err message.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fd36570a

ovl: don't cache acl on overlay layer · 2a3a2a3f

由 Miklos Szeredi 提交于 9月 01, 2016

Some operations (setxattr/chmod) can make the cached acl stale.  We either
need to clear overlay's acl cache for the affected inode or prevent acl
caching on the overlay altogether.  Preventing caching has the following
advantages:

 - no double caching, less memory used

 - overlay cache doesn't go stale when fs clears it's own cache

Possible disadvantage is performance loss.  If that becomes a problem
get_acl() can be optimized for overlayfs.

This patch disables caching by pre setting i_*acl to a value that

  - has bit 0 set, so is_uncached_acl() will return true

  - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it

The constant -3 was chosen for this purpose.

Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

2a3a2a3f

ovl: use cached acl on underlying layer · 5201dc44

由 Miklos Szeredi 提交于 9月 01, 2016

Instead of calling ->get_acl() directly, use get_acl() to get the cached
value.

We will have the acl cached on the underlying inode anyway, because we do
permission checking on the both the overlay and the underlying fs.

So, since we already have double caching, this improves performance without
any cost.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5201dc44

ovl: proper cleanup of workdir · eea2fb48

由 Miklos Szeredi 提交于 9月 01, 2016

When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.

Previously the mount code removed this directory if it already existed and
created a new one.  If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.

While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.

In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.

This patch implements cleaning out any files left in workdir.  It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>

eea2fb48

ovl: remove posix_acl_default from workdir · c11b9fdd

由 Miklos Szeredi 提交于 9月 01, 2016

Clear out posix acl xattrs on workdir and also reset the mode after
creation so that an inherited sgid bit is cleared.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>

c11b9fdd

ovl: handle umask and posix_acl_default correctly on creation · 38b25697

由 Miklos Szeredi 提交于 9月 01, 2016

Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to
create functions without masking against umask.

Another problem when creating over a whiteout is that the default posix acl
is not inherited from the parent dir (because the real parent dir at the
time of creation is the work directory).

Fix these problems by:

 a) If upper fs does not have MS_POSIXACL, then mask mode with umask.

 b) If creating over a whiteout, call posix_acl_create() to get the
 inherited acls.  After creation (but before moving to the final
 destination) set these acls on the created file.  posix_acl_create() also
 updates the file creation mode as appropriate.

Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

38b25697

mm: introduce get_task_exe_file · cd81a917

由 Mateusz Guzik 提交于 8月 23, 2016

For more convenient access if one has a pointer to the task.

As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab ->exe_file. This saves mm refcount dance.

Use the helper in proc_exe_link.
Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NRichard Guy Briggs <rgb@redhat.com>
Cc: <stable@vger.kernel.org> # 4.3.x
Signed-off-by: NPaul Moore <paul@paul-moore.com>

cd81a917

binfmt_elf: switch to new creds when switching to new mm · 9f834ec1

由 Linus Torvalds 提交于 8月 22, 2016

We used to delay switching to the new credentials until after we had
mapped the executable (and possible elf interpreter).  That was kind of
odd to begin with, since the new executable will actually then _run_
with the new creds, but whatever.

The bigger problem was that we also want to make sure that we turn off
prof events and tracing before we start mapping the new executable
state.  So while this is a cleanup, it's also a fix for a possible
information leak.
Reported-by: NRobert Święcki <robert@swiecki.net>
Tested-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f834ec1

31 8月, 2016 2 次提交

sysfs: correctly handle read offset on PREALLOC attrs · 17d0774f

由 Konstantin Khlebnikov 提交于 6月 22, 2016

Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
byte at a time. Script gets 'i' instead of 'idle' when reads current action
from /sys/block/$dev/md/sync_action and as a result does nothing.

This patch adds trivial implementation of partial read: generate whole
string and move required part into buffer head.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 4ef67a8c ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
Cc: Stable <stable@vger.kernel.org> # v3.19+
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

17d0774f

kernfs: don't depend on d_find_any_alias() when generating notifications · df6a58c5

由 Tejun Heo 提交于 6月 17, 2016

kernfs_notify_workfn() sends out file modified events for the
scheduled kernfs_nodes.  Because the modifications aren't from
userland, it doesn't have the matching file struct at hand and can't
use fsnotify_modify().  Instead, it looked up the inode and then used
d_find_any_alias() to find the dentry and used fsnotify_parent() and
fsnotify() directly to generate notifications.

The assumption was that the relevant dentries would have been pinned
if there are listeners, which isn't true as inotify doesn't pin
dentries at all and watching the parent doesn't pin the child dentries
even for dnotify.  This led to, for example, inotify watchers not
getting notifications if the system is under memory pressure and the
matching dentries got reclaimed.  It can also be triggered through
/proc/sys/vm/drop_caches or a remount attempt which involves shrinking
dcache.

fsnotify_parent() only uses the dentry to access the parent inode,
which kernfs can do easily.  Update kernfs_notify_workfn() so that it
uses fsnotify() directly for both the parent and target inodes without
going through d_find_any_alias().  While at it, supply the target file
name to fsnotify() from kernfs_node->name.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NEvgeny Vereshchagin <evvers@ya.ru>
Fixes: d911d987 ("kernfs: make kernfs_notify() trigger inotify events too")
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

df6a58c5

30 8月, 2016 4 次提交

NFSv4.x: Fix a refcount leak in nfs_callback_up_net · 98b0f80c

由 Trond Myklebust 提交于 8月 29, 2016

On error, the callers expect us to return without bumping
nn->cb_users[].
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.7+

98b0f80c

NFS4: Avoid migration loops · 52442f9b

由 Benjamin Coddington 提交于 8月 30, 2016

If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server. Catch
this by checking if the nfs_client found is the same as the existing
client. For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

52442f9b

xfs: track log done items directly in the deferred pending work item · ea78d808

由 Darrick J. Wong 提交于 8月 30, 2016

Christoph reports slab corruption when a deferred refcount update
aborts during _defer_finish().  The cause of this was broken log item
state tracking in xfs_defer_pending -- upon an abort,
_defer_trans_abort() will call abort_intent on all intent items,
including the ones that have already had a done item attached.

This is incorrect because each intent item has 2 refcount: the first
is released when the intent item is committed to the log; and the
second is released when the _done_ item is committed to the log, or
by the intent creator if there is no done item.  In other words, once
we log the done item, responsibility for releasing the intent item's
second refcount is transferred to the done item and /must not/ be
performed by anything else.

The dfp_committed flag should have been tracking whether or not we had
a done item so that _defer_trans_abort could decide if it needs to
abort the intent item, but due to a thinko this was not the case.  Rip
it out and track the done item directly so that we do the right thing
w.r.t. intent item freeing.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reported-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

ea78d808

pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails · 3dc14735

由 Trond Myklebust 提交于 8月 29, 2016

If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
ff_layout_pg_init_write, then we currently end up clearing the layout
segment carried by the struct nfs_pageio_descriptor, causing an Oops
when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.

The fix is to ensure we return the layout and then retry.

Fixes: 446ca219 ("pNFS/flexfiles: When initing reads or writes, we...")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3dc14735

29 8月, 2016 4 次提交

iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems · 17de0a9f

由 Christoph Hellwig 提交于 8月 29, 2016

Filesystems like XFS that use extents should not set the
FIEMAP_EXTENT_MERGED flag in the fiemap extent structures.  To allow
for both behaviors for the upcoming gfs2 usage split the iomap
type field into type and flags, and only set FIEMAP_EXTENT_MERGED if
the IOMAP_F_MERGED flag is set.  The flags field will also come in
handy for future features such as shared extents on reflink-enabled
file systems.
Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

17de0a9f

T
NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence · d138027a
由 Trond Myklebust 提交于 8月 28, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
d138027a

NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN · 2e80dbe7

由 Trond Myklebust 提交于 8月 28, 2016

Defer freeing the slot until after we have processed the results from
OPEN and LAYOUTGET. This means that the server can rely on the
mechanism in RFC5661 Section 2.10.6.3 to ensure that replies to an
OPEN or LAYOUTGET/RETURN RPC call don't race with the callbacks that
apply to them.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2e80dbe7

NFSv4.1: Defer bumping the slot sequence number until we free the slot · 07e8dcbd

由 Trond Myklebust 提交于 8月 28, 2016

For operations like OPEN or LAYOUTGET, which return recallable state
(i.e. delegations and layouts) we want to enable the mechanism for
resolving recall races in RFC5661 Section 2.10.6.3.
To do so, we will want to defer bumping the slot's sequence number until
we have finished processing the RPC results.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

07e8dcbd

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功