提交 · 22f6b4d34fcf039c63a94e7670e0da24f8575a5a · OpenHarmony / kernel_linux

16 9月, 2016 3 次提交

aio: mark AIO pseudo-fs noexec · 22f6b4d3

由 Jann Horn 提交于 9月 16, 2016

This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set.  Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.

I have tested the patch on my machine.

To test the behavior, compile and run this:

    #define _GNU_SOURCE
    #include <unistd.h>
    #include <sys/personality.h>
    #include <linux/aio_abi.h>
    #include <err.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/syscall.h>

    int main(void) {
        personality(READ_IMPLIES_EXEC);
        aio_context_t ctx = 0;
        if (syscall(__NR_io_setup, 1, &ctx))
            err(1, "io_setup");

        char cmd[1000];
        sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
            (int)getpid());
        system(cmd);
        return 0;
    }

In the output, "rw-s" is good, "rwxs" is bad.
Signed-off-by: NJann Horn <jann@thejh.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

22f6b4d3

vfs: cap dedupe request structure size at PAGE_SIZE · b71dbf10

由 Darrick J. Wong 提交于 9月 14, 2016

Kirill A Shutemov reports that the kernel doesn't try to cap dest_count
in any way, and uses the number to allocate kernel memory.  This causes
high order allocation warnings in the kernel log if someone passes in a
big enough value.  We should clamp the allocation at PAGE_SIZE to avoid
stressing the VM.

The two existing users of the dedupe ioctl never send more than 120
requests, so we can safely clamp dest_range at PAGE_SIZE, because with
4k pages we can handle up to 127 dedupe candidates.  Given the max
extent length of 16MB, we can end up doing 2GB of IO which is plenty.

[ Note: the "offsetof()" can't overflow, because 'count' is just a
  16-bit integer.  That's not obvious in the limited context of the
  patch, so I'm noting it here because it made me go look.  - Linus ]
Reported-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b71dbf10

vfs: fix return type of ioctl_file_dedupe_range · 5297e0f0

由 Darrick J. Wong 提交于 9月 14, 2016

All the VFS functions in the dedupe ioctl path return int status, so
the ioctl handler ought to as well.

Found by Coverity, CID 1350952.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5297e0f0

12 9月, 2016 1 次提交

NFSv4.1: Fix the CREATE_SESSION slot number accounting · b519d408

由 Trond Myklebust 提交于 9月 11, 2016

Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org

b519d408

10 9月, 2016 4 次提交

fscrypto: require write access to mount to set encryption policy · ba63f23d

由 Eric Biggers 提交于 9月 08, 2016

Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>

ba63f23d

fscrypto: only allow setting encryption policy on directories · 002ced4b

由 Eric Biggers 提交于 9月 08, 2016

The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files.  This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.

In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.

This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later.  It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.

This patch restores the !S_ISDIR() check that was present in older
kernels.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

002ced4b

fscrypto: add authorization check for setting encryption policy · 163ae1c6

由 Eric Biggers 提交于 9月 08, 2016

On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access.  This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy.  This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

163ae1c6

mm: fix show_smap() for zone_device-pmd ranges · ca120cf6

由 Dan Williams 提交于 9月 03, 2016

Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
currently results in the following VM_BUG_ONs:

 kernel BUG at mm/huge_memory.c:1105!
 task: ffff88045f16b140 task.stack: ffff88045be14000
 RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
 [..]
 Call Trace:
  [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
  [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307656>] show_smap+0xa6/0x2b0

 kernel BUG at fs/proc/task_mmu.c:585!
 RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
 Call Trace:
  [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307696>] show_smap+0xa6/0x2b0

These locations are sanity checking page flags that must be set for an
anonymous transparent huge page, but are not set for the zone_device
pages associated with dax mappings.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ca120cf6

06 9月, 2016 2 次提交

btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress · ce129655

由 Wang Xiaoguang 提交于 9月 02, 2016

In btrfs_async_reclaim_metadata_space(), we use ticket's address to
determine whether asynchronous metadata reclaim work is making progress.

	ticket = list_first_entry(&space_info->tickets,
				  struct reserve_ticket, list);
	if (last_ticket == ticket) {
		flush_state++;
	} else {
		last_ticket = ticket;
		flush_state = FLUSH_DELAYED_ITEMS_NR;
		if (commit_cycles)
			commit_cycles--;
	}

But indeed it's wrong, we should not rely on local variable's address to
do this check, because addresses may be same. In my test environment, I
dd one 168MB file in a 256MB fs, found that for this file, every time
wait_reserve_ticket() called, local variable ticket's address is same,

For above codes, assume a previous ticket's address is addrA, last_ticket
is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
wake up it, then another ticket is added, but with the same address addrA,
now last_ticket will be same to current ticket, then current ticket's flush
work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
which may result in some enospc issues(I have seen this in my test machine).
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ce129655

Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns · cbd60aa7

由 Chris Mason 提交于 9月 06, 2016

We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out.  It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433deb added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list.  This
patch makes sure we remove things before exiting.
Signed-off-by: NChris Mason <clm@fb.com>
Fixes: d1433deb
cc: stable@vger.kernel.org # 3.15+

cbd60aa7

05 9月, 2016 4 次提交

btrfs: do not decrease bytes_may_use when replaying extents · ed7a6948

由 Wang Xiaoguang 提交于 8月 26, 2016

When replaying extents, there is no need to update bytes_may_use
in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a
WARN_ON about bytes_may_use.

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ed7a6948

ceph: do not modify fi->frag in need_reset_readdir() · 0f5aa88a

由 Nicolas Iooss 提交于 8月 28, 2016

Commit f3c4ebe6 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
            } else if (fi->frag |= fpos_frag(new_pos)) {
                       ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^
                       (                             )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^~
                                !=

Fixes: f3c4ebe6 ("ceph: using hash value to compose dentry offset")
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0f5aa88a

ovl: fix workdir creation · e1ff3dd1

由 Miklos Szeredi 提交于 9月 05, 2016

Workdir creation fails in latest kernel.

Fix by allowing EOPNOTSUPP as a valid return value from
vfs_removexattr(XATTR_NAME_POSIX_ACL_*).  Upper filesystem may not support
ACL and still be perfectly able to support overlayfs.
Reported-by: NMartin Ziegler <ziegler@uni-freiburg.de>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Fixes: c11b9fdd ("ovl: remove posix_acl_default from workdir")
Cc: <stable@vger.kernel.org>

e1ff3dd1

pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETs · 334a8f37

由 Trond Myklebust 提交于 9月 04, 2016

If there are outstanding LAYOUTGET rpc calls, then we want to ensure that
we keep the layout stateid around so we that don't inadvertently pick up
an old/misordered sequence id.
The race is as follows:

Client				Server
======				======
LAYOUTGET(seqid)
LAYOUTGET(seqid)
				return LAYOUTGET(seqid+1)
				return LAYOUTGET(seqid+2)
process LAYOUTGET(seqid+2)
	forget layout
process LAYOUTGET(seqid+1)

If it forgets the layout stateid before processing seqid+1, then
the client will not check the layout->plh_barrier, and so will set
the stateid with seqid+1.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

334a8f37

04 9月, 2016 5 次提交

devpts: return NULL pts 'priv' entry for non-devpts nodes · 3e423945

由 Linus Torvalds 提交于 9月 03, 2016

In commit 8ead9dd5 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly.  And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.

And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.

So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org # 4.6+
Cc: Eric W Biederman" <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e423945

pNFS: Clear out all layout segments if the server unsets lrp->res.lrs_present · 52ec7be2

由 Trond Myklebust 提交于 9月 03, 2016

If the server fails to set lrp->res.lrs_present in the LAYOUTRETURN reply,
then that means it believes the client holds no more layout state for that
file, and that the layout stateid is now invalid.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

52ec7be2

pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STID · 2a59a041

由 Trond Myklebust 提交于 9月 03, 2016

If the layout was marked as invalid, we want to ensure to initialise
the layout header fields correctly.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2a59a041

pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised · bf0291dd

由 Trond Myklebust 提交于 9月 03, 2016

According to RFC5661, the client is responsible for serialising
LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case
where we send both in parallel.

Client					Server
======					======
LAYOUTGET(seqid=X)
LAYOUTRETURN(seqid=X)
					LAYOUTGET return seqid=X+1
					LAYOUTRETURN return seqid=X+2
Process LAYOUTRETURN
          Forget layout stateid
Process LAYOUTGET
          Set seqid=X+1

The client processes the layoutget/layoutreturn in the wrong order,
and since the result of the layoutreturn was to clear the only
existing layout segment, the client forgets the layout stateid.

When the LAYOUTGET comes in, it is treated as having a completely
new stateid, and so the client sets the wrong sequence id...

Fix is to check if there are outstanding LAYOUTGET requests
before we send the LAYOUTRETURN (note that LAYOUGET will already
wait if it sees an outstanding LAYOUTRETURN).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bf0291dd

NFS: Fix error reporting in nfs_file_write() · c49edecd

由 Trond Myklebust 提交于 9月 03, 2016

When doing O_DSYNC writes, the actual write errors are reported through
generic_write_sync(), so we must test the result.
Reported-by: NJ. R. Okajima <hooanon05g@gmail.com>
Fixes: 18290650 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c49edecd

01 9月, 2016 17 次提交

btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket() · e0af2484

由 Wang Xiaoguang 提交于 8月 31, 2016

If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
return directly and also forget to wake up process which are waiting for
their tickets, so these processes will wait endlessly.

Fstests case generic/172 with mount option "-o compress=lzo" have revealed
this bug in my test machine. Here if we have tickets to handle, we must
handle them first.
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e0af2484

Btrfs: fix endless loop in balancing block groups · a9b1fc85

由 Liu Bo 提交于 8月 31, 2016

Qgroup function may overwrite the saved error 'err' with 0
in case quota is not enabled, and this ends up with a
endless loop in balance because we keep going back to balance
the same block group.

It really should use 'ret' instead.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a9b1fc85

Btrfs: kill invalid ASSERT() in process_all_refs() · 3dc09ec8

由 Josef Bacik 提交于 8月 24, 2016

Suppose you have the following tree in snap1 on a file system mounted with -o
inode_cache so that inode numbers are recycled

└── [    258]  a
    └── [    257]  b

and then you remove b, rename a to c, and then re-create b in c so you have the
following tree

└── [    258]  c
    └── [    257]  b

and then you try to do an incremental send you will hit

ASSERT(pending_move == 0);

in process_all_refs().  This is because we assume that any recycling of inodes
will not have a pending change in our path, which isn't the case.  This is the
case for the DELETE side, since we want to remove the old file using the old
path, but on the create side we could have a pending move and need to do the
normal pending rename dance.  So remove this ASSERT() and put a comment about
why we ignore pending_move.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3dc09ec8

ovl: listxattr: use strnlen() · 7cb35119

由 Miklos Szeredi 提交于 9月 01, 2016

Be defensive about what underlying fs provides us in the returned xattr
list buffer.  If it's not properly null terminated, bail out with a warning
insead of BUG.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>

7cb35119

ovl: Switch to generic_getxattr · 0eb45fc3

由 Andreas Gruenbacher 提交于 8月 22, 2016

Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use
those same handlers for iop->getxattr as well.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0eb45fc3

ovl: copyattr after setting POSIX ACL · ce31513a

由 Miklos Szeredi 提交于 9月 01, 2016

Setting POSIX acl may also modify the file mode, so need to copy that up to
the overlay inode.
Reported-by: NEryu Guan <eguan@redhat.com>
Fixes: d837a49b ("ovl: fix POSIX ACL setting")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

ce31513a

ovl: Switch to generic_removexattr · 0e585ccc

由 Andreas Gruenbacher 提交于 8月 22, 2016

Commit d837a49b ("ovl: fix POSIX ACL setting") switches from
iop->setxattr from ovl_setxattr to generic_setxattr, so switch from
ovl_removexattr to generic_removexattr as well.  As far as permission
checking goes, the same rules should apply in either case.

While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that
this is not an iop->setxattr implementation and remove the unused inode
argument.

Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the
order of handlers in ovl_xattr_handlers.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49b ("ovl: fix POSIX ACL setting")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0e585ccc

ovl: Get rid of ovl_xattr_noacl_handlers array · 0c97be22

由 Andreas Gruenbacher 提交于 8月 22, 2016

Use an ordinary #ifdef to conditionally include the POSIX ACL handlers
in ovl_xattr_handlers, like the other filesystems do.  Flag the code
that is now only used conditionally with __maybe_unused.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0c97be22

ovl: Fix OVL_XATTR_PREFIX · fe2b7595

由 Andreas Gruenbacher 提交于 8月 22, 2016

Make sure ovl_own_xattr_handler only matches attribute names starting
with "overlay.", not "overlayXXX".
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Fixes: d837a49b ("ovl: fix POSIX ACL setting")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fe2b7595

ovl: fix spelling mistake: "directries" -> "directories" · fd36570a

由 Colin Ian King 提交于 8月 18, 2016

Trivial fix to spelling mistake in pr_err message.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fd36570a

ovl: don't cache acl on overlay layer · 2a3a2a3f

由 Miklos Szeredi 提交于 9月 01, 2016

Some operations (setxattr/chmod) can make the cached acl stale.  We either
need to clear overlay's acl cache for the affected inode or prevent acl
caching on the overlay altogether.  Preventing caching has the following
advantages:

 - no double caching, less memory used

 - overlay cache doesn't go stale when fs clears it's own cache

Possible disadvantage is performance loss.  If that becomes a problem
get_acl() can be optimized for overlayfs.

This patch disables caching by pre setting i_*acl to a value that

  - has bit 0 set, so is_uncached_acl() will return true

  - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it

The constant -3 was chosen for this purpose.

Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

2a3a2a3f

ovl: use cached acl on underlying layer · 5201dc44

由 Miklos Szeredi 提交于 9月 01, 2016

Instead of calling ->get_acl() directly, use get_acl() to get the cached
value.

We will have the acl cached on the underlying inode anyway, because we do
permission checking on the both the overlay and the underlying fs.

So, since we already have double caching, this improves performance without
any cost.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5201dc44

ovl: proper cleanup of workdir · eea2fb48

由 Miklos Szeredi 提交于 9月 01, 2016

When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.

Previously the mount code removed this directory if it already existed and
created a new one.  If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.

While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.

In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.

This patch implements cleaning out any files left in workdir.  It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>

eea2fb48

ovl: remove posix_acl_default from workdir · c11b9fdd

由 Miklos Szeredi 提交于 9月 01, 2016

Clear out posix acl xattrs on workdir and also reset the mode after
creation so that an inherited sgid bit is cleared.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>

c11b9fdd

ovl: handle umask and posix_acl_default correctly on creation · 38b25697

由 Miklos Szeredi 提交于 9月 01, 2016

Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to
create functions without masking against umask.

Another problem when creating over a whiteout is that the default posix acl
is not inherited from the parent dir (because the real parent dir at the
time of creation is the work directory).

Fix these problems by:

 a) If upper fs does not have MS_POSIXACL, then mask mode with umask.

 b) If creating over a whiteout, call posix_acl_create() to get the
 inherited acls.  After creation (but before moving to the final
 destination) set these acls on the created file.  posix_acl_create() also
 updates the file creation mode as appropriate.

Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

38b25697

mm: introduce get_task_exe_file · cd81a917

由 Mateusz Guzik 提交于 8月 23, 2016

For more convenient access if one has a pointer to the task.

As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab ->exe_file. This saves mm refcount dance.

Use the helper in proc_exe_link.
Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NRichard Guy Briggs <rgb@redhat.com>
Cc: <stable@vger.kernel.org> # 4.3.x
Signed-off-by: NPaul Moore <paul@paul-moore.com>

cd81a917

binfmt_elf: switch to new creds when switching to new mm · 9f834ec1

由 Linus Torvalds 提交于 8月 22, 2016

We used to delay switching to the new credentials until after we had
mapped the executable (and possible elf interpreter).  That was kind of
odd to begin with, since the new executable will actually then _run_
with the new creds, but whatever.

The bigger problem was that we also want to make sure that we turn off
prof events and tracing before we start mapping the new executable
state.  So while this is a cleanup, it's also a fix for a possible
information leak.
Reported-by: NRobert Święcki <robert@swiecki.net>
Tested-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f834ec1

31 8月, 2016 2 次提交

sysfs: correctly handle read offset on PREALLOC attrs · 17d0774f

由 Konstantin Khlebnikov 提交于 6月 22, 2016

Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
byte at a time. Script gets 'i' instead of 'idle' when reads current action
from /sys/block/$dev/md/sync_action and as a result does nothing.

This patch adds trivial implementation of partial read: generate whole
string and move required part into buffer head.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 4ef67a8c ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
Cc: Stable <stable@vger.kernel.org> # v3.19+
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

17d0774f

kernfs: don't depend on d_find_any_alias() when generating notifications · df6a58c5

由 Tejun Heo 提交于 6月 17, 2016

kernfs_notify_workfn() sends out file modified events for the
scheduled kernfs_nodes.  Because the modifications aren't from
userland, it doesn't have the matching file struct at hand and can't
use fsnotify_modify().  Instead, it looked up the inode and then used
d_find_any_alias() to find the dentry and used fsnotify_parent() and
fsnotify() directly to generate notifications.

The assumption was that the relevant dentries would have been pinned
if there are listeners, which isn't true as inotify doesn't pin
dentries at all and watching the parent doesn't pin the child dentries
even for dnotify.  This led to, for example, inotify watchers not
getting notifications if the system is under memory pressure and the
matching dentries got reclaimed.  It can also be triggered through
/proc/sys/vm/drop_caches or a remount attempt which involves shrinking
dcache.

fsnotify_parent() only uses the dentry to access the parent inode,
which kernfs can do easily.  Update kernfs_notify_workfn() so that it
uses fsnotify() directly for both the parent and target inodes without
going through d_find_any_alias().  While at it, supply the target file
name to fsnotify() from kernfs_node->name.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NEvgeny Vereshchagin <evvers@ya.ru>
Fixes: d911d987 ("kernfs: make kernfs_notify() trigger inotify events too")
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

df6a58c5

30 8月, 2016 2 次提交

NFSv4.x: Fix a refcount leak in nfs_callback_up_net · 98b0f80c

由 Trond Myklebust 提交于 8月 29, 2016

On error, the callers expect us to return without bumping
nn->cb_users[].
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.7+

98b0f80c

NFS4: Avoid migration loops · 52442f9b

由 Benjamin Coddington 提交于 8月 30, 2016

If a server returns itself as a location while migrating, the client may
end up getting stuck attempting to migrate twice to the same server. Catch
this by checking if the nfs_client found is the same as the existing
client. For the other two callers to nfs4_set_client, the nfs_client will
always be ERR_PTR(-EINVAL).
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

52442f9b

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多