提交 · cc6acc20a606f9ca65a6338af8204b7ae9e67b1a · openanolis / cloud-kernel

14 7月, 2017 17 次提交

sunrpc: properly type pc_decode callbacks · cc6acc20

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the argp argument as it can trivially be derived from the rqstp
argument.  With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cc6acc20

sunrpc: properly type pc_release callbacks · 1150ded8

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1150ded8

sunrpc: properly type pc_func callbacks · 1c8a5409

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument.  With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1c8a5409

C
nfsd: remove the unused PROC() macro in nfs3proc.c · 36ba89c2
由 Christoph Hellwig 提交于 5月 08, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
36ba89c2
C
nfsd: use named initializers in PROC() · ec7e8cae
由 Christoph Hellwig 提交于 5月 08, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
ec7e8cae
C
nfsd4: const-ify nfs_cb_version4 · 39d43f75
由 Christoph Hellwig 提交于 5月 12, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
39d43f75

sunrpc: mark all struct rpc_procinfo instances as const · 511e936b

由 Christoph Hellwig 提交于 5月 12, 2017

struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

511e936b

C
nfs: use ARRAY_SIZE() in the nfsacl_version3 declaration · 9ae7d8ff
由 Christoph Hellwig 提交于 5月 12, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
9ae7d8ff

sunrpc: move p_count out of struct rpc_procinfo · c551858a

由 Christoph Hellwig 提交于 5月 08, 2017

p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c551858a

lockd: fix some weird indentation · e91ff8e3

由 Christoph Hellwig 提交于 5月 08, 2017

Remove double indentation of a few struct rpc_version and
struct rpc_program instance.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e91ff8e3

nfs: don't cast callback decode/proc/encode routines · 947c6e43

由 Christoph Hellwig 提交于 5月 11, 2017

Instead declare all functions with the proper methods signature.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

947c6e43

nfs: fix decoder callback prototypes · fc016483

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fc016483

lockd: fix decoder callback prototypes · 04000564

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

04000564

nfsd: fix decoder callback prototypes · 5362a4ec

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

5362a4ec

nfsd: fix encoder callback prototypes · 843efb7d

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

843efb7d

nfs: fix encoder callback prototypes · fcc85819

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fcc85819

lockd: fix encoder callback prototypes · d1607338

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d1607338

11 6月, 2017 1 次提交
- A
  ufs: we need to sync inode before freeing it · 67a70017
  由 Al Viro 提交于 6月 10, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  67a70017
10 6月, 2017 10 次提交

excessive checks in ufs_write_failed() and ufs_evict_inode() · babef37d

由 Al Viro 提交于 6月 09, 2017

As it is, short copy in write() to append-only file will fail
to truncate the excessive allocated blocks.  As the matter of
fact, all checks in ufs_truncate_blocks() are either redundant
or wrong for that caller.  As for the only other caller
(ufs_evict_inode()), we only need the file type checks there.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

babef37d

A
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path · 006351ac
由 Al Viro 提交于 6月 08, 2017
```
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
006351ac

ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments() · 940ef1a0

由 Al Viro 提交于 6月 08, 2017

... and it really needs splitting into "new" and "extend" cases, but that's for
later

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

940ef1a0

A
ufs: set correct ->s_maxsize · 6b0d144f
由 Al Viro 提交于 6月 08, 2017
```
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6b0d144f
A
ufs: restore maintaining ->i_blocks · eb315d2a
由 Al Viro 提交于 6月 08, 2017
```
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
eb315d2a

fix ufs_isblockset() · 414cf718

由 Al Viro 提交于 6月 08, 2017

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

414cf718

A
ufs: restore proper tail allocation · 8785d84d
由 Al Viro 提交于 6月 08, 2017
```
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8785d84d

Btrfs: fix delalloc accounting leak caused by u32 overflow · 70e7af24

由 Omar Sandoval 提交于 6月 02, 2017

btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication,
which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2).
For a nodesize of 16kB, this overflow happens at 16k items. Usually,
num_items is a small constant passed to btrfs_start_transaction(), but
we also use btrfs_calc_trans_metadata_size() for metadata reservations
for extent items in btrfs_delalloc_{reserve,release}_metadata().

In drop_outstanding_extents(), num_items is calculated as
inode->reserved_extents - inode->outstanding_extents. The difference
between these two counters is usually small, but if many delalloc
extents are reserved and then the outstanding extents are merged in
btrfs_merge_extent_hook(), the difference can become large enough to
overflow in btrfs_calc_trans_metadata_size().

The overflow manifests itself as a leak of a multiple of 4 GB in
delalloc_block_rsv and the metadata bytes_may_use counter. This in turn
can cause early ENOSPC errors. Additionally, these WARN_ONs in
extent-tree.c will be hit when unmounting:

    WARN_ON(fs_info->delalloc_block_rsv.size > 0);
    WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
    WARN_ON(space_info->bytes_pinned > 0 ||
            space_info->bytes_reserved > 0 ||
            space_info->bytes_may_use > 0);

Fix it by casting nodesize to a u64 so that
btrfs_calc_trans_metadata_size() does a full 64-bit multiplication.
While we're here, do the same in btrfs_calc_trunc_metadata_size(); this
can't overflow with any existing uses, but it's better to be safe here
than have another hard-to-debug problem later on.

Cc: stable@vger.kernel.org
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

70e7af24

Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io · 452e62b7

由 Liu Bo 提交于 5月 26, 2017

Before this, we use 'filled' mode here, ie. if all range has been
filled with EXTENT_DEFRAG bits, get to clear it, but if the defrag
range joins the adjacent delalloc range, then we'll have EXTENT_DEFRAG
bits in extent_state until releasing this inode's pages, and that
prevents extent_data from being freed.

This clears the bit if any was found within the ordered extent.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

452e62b7

btrfs: tree-log.c: Wrong printk information about namelen · 286b92f4

由 Su Yue 提交于 5月 24, 2017

In verify_dir_item, it wants to printk name_len of dir_item but
printk data_len acutally.

Fix it by calling btrfs_dir_name_len instead of btrfs_dir_data_len.
Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

286b92f4

05 6月, 2017 1 次提交

fs/ufs: Set UFS default maximum bytes per file · 239e250e

由 Richard Narron 提交于 6月 04, 2017

This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.
Signed-off-by: NRichard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737fSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

239e250e

04 6月, 2017 1 次提交

nfs: Mark unnecessarily extern functions as static · 4f253e1e

由 Jan Kara 提交于 5月 16, 2017

nfs_initialise_sb() and nfs_clone_super() are declared as extern even
though they are used only in fs/nfs/super.c. Mark them as static.

Also remove explicit 'inline' directive from nfs_initialise_sb() and
leave it upto compiler to decide whether inlining is worth it.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4f253e1e

03 6月, 2017 1 次提交

dax: fix race between colliding PMD & PTE entries · e2093926

由 Ross Zwisler 提交于 6月 02, 2017

We currently have two related PMD vs PTE races in the DAX code.  These
can both be easily triggered by having two threads reading and writing
simultaneously to the same private mapping, with the key being that
private mapping reads can be handled with PMDs but private mapping
writes are always handled with PTEs so that we can COW.

Here is the first race:

  CPU 0					CPU 1

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
    handle_pte_fault()
      passes check for pmd_devmap()

					(private mapping read)
					__handle_mm_fault()
					  create_huge_pmd()
					    dax_iomap_pmd_fault() inserts PMD

      dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
      			  installed in our page tables at this spot.

Here's the second race:

  CPU 0					CPU 1

  (private mapping read)
  __handle_mm_fault()
    passes check for pmd_none()
    create_huge_pmd()
      dax_iomap_pmd_fault() inserts PMD

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
					(private mapping read)
					__handle_mm_fault()
					  passes check for pmd_none()
					  create_huge_pmd()

    handle_pte_fault()
      dax_iomap_pte_fault() inserts PTE
					    dax_iomap_pmd_fault() inserts PMD,
					       but we already have a PTE at
					       this spot.

The core of the issue is that while there is isolation between faults to
the same range in the DAX fault handlers via our DAX entry locking,
there is no isolation between faults in the code in mm/memory.c.  This
means for instance that this code in __handle_mm_fault() can run:

	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
		ret = create_huge_pmd(&vmf);

But by the time we actually get to run the fault handler called by
create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
fault has installed a normal PMD here as a parent.  This is the cause of
the 2nd race.  The first race is similar - there is the following check
in handle_pte_fault():

	} else {
		/* See comment in pte_alloc_one_map() */
		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
			return 0;

So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
will bail and retry the fault.  This is correct, but there is nothing
preventing the PMD from being installed after this check but before we
actually get to the DAX PTE fault handlers.

In my testing these races result in the following types of errors:

  BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
  BUG: non-zero nr_ptes on freeing mm: 15

Fix this issue by having the DAX fault handlers verify that it is safe
to continue their fault after they have taken an entry lock to block
other racing faults.

[ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
  Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reported-by: NPawel Lebioda <pawel.lebioda@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2093926

01 6月, 2017 3 次提交

btrfs: fix race with relocation recovery and fs_root setup · a9b3311e

由 Jeff Mahoney 提交于 5月 17, 2017

If we have to recover relocation during mount, we'll ultimately have to
evict the orphan inode.  That goes through the reservation dance, where
priority_reclaim_metadata_space and flush_space expect fs_info->fs_root
to be valid.  That's the next thing to be set up during mount, so we
crash, almost always in flush_space trying to join the transaction
but priority_reclaim_metadata_space is possible as well.  This call
path has been problematic in the past WRT whether ->fs_root is valid
yet.  Commit 957780eb (Btrfs: introduce ticketed enospc
infrastructure) added new users that are called in the direct path
instead of the async path that had already been worked around.

The thing is that we don't actually need the fs_root, specifically, for
anything.  We either use it to determine whether the root is the
chunk_root for use in choosing an allocation profile or as a root to pass
btrfs_join_transaction before immediately committing it.  Anything that
isn't the chunk root works in the former case and any root works in
the latter.

A simple fix is to use a root we know will always be there: the
extent_root.

Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 957780eb (Btrfs: introduce ticketed enospc infrastructure)
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a9b3311e

btrfs: fix memory leak in update_space_info failure path · 896533a7

由 Jeff Mahoney 提交于 5月 17, 2017

If we fail to add the space_info kobject, we'll leak the memory
for the percpu counter.

Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

896533a7

btrfs: use correct types for page indices in btrfs_page_exists_in_range · cc2b702c

由 David Sterba 提交于 5月 12, 2017

Variables start_idx and end_idx are supposed to hold a page index
derived from the file offsets. The int type is not the right one though,
offsets larger than 1 << 44 will get silently trimmed off the high bits.
(1 << 44 is 16TiB)

What can go wrong, if start is below the boundary and end gets trimmed:
- if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
- the final check "if (page->index <= end_idx)" will unexpectedly fail

The function will return false, ie. "there's no page in the range",
although there is at least one.

btrfs_page_exists_in_range is used to prevent races in:

* in hole punching, where we make sure there are not pages in the
  truncated range, otherwise we'll wait for them to finish and redo
  truncation, but we're going to replace the pages with holes anyway so
  the only problem is the intermediate state

* lock_extent_direct: we want to make sure there are no pages before we
  lock and start DIO, to prevent stale data reads

For practical occurence of the bug, there are several constaints.  The
file must be quite large, the affected range must cross the 16TiB
boundary and the internal state of the file pages and pending operations
must match.  Also, we must not have started any ordered data in the
range, otherwise we don't even reach the buggy function check.

DIO locking tries hard in several places to avoid deadlocks with
buffered IO and avoids waiting for ranges. The worst consequence seems
to be stale data read.

CC: Liu Bo <bo.li.liu@oracle.com>
CC: stable@vger.kernel.org	# 3.16+
Fixes: fc4adbff ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cc2b702c

31 5月, 2017 2 次提交

xfs: use ->b_state to fix buffer I/O accounting release race · 63db7c81

由 Brian Foster 提交于 5月 31, 2017

We've had user reports of unmount hangs in xfs_wait_buftarg() that
analysis shows is due to btp->bt_io_count == -1. bt_io_count
represents the count of in-flight asynchronous buffers and thus
should always be >= 0. xfs_wait_buftarg() waits for this value to
stabilize to zero in order to ensure that all untracked (with
respect to the lru) buffers have completed I/O processing before
unmount proceeds to tear down in-core data structures.

The value of -1 implies an I/O accounting decrement race. Indeed,
the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
(where the buffer lock is no longer held) means that bp->b_flags can
be updated from an unsafe context. While a user-level reproducer is
currently not available, some intrusive hacks to run racing buffer
lookups/ioacct/releases from multiple threads was used to
successfully manufacture this problem.

Existing callers do not expect to acquire the buffer lock from
xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
this context. It turns out that we already have separate buffer
state bits and associated serialization for dealing with buffer LRU
state in the form of ->b_state and ->b_lock. Therefore, replace the
_XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
accounting wrappers appropriately and make sure they are used with
the correct locking. This ensures that buffer in-flight state can be
modified at buffer release time without racing with modifications
from a buffer lock holder.

Fixes: 9c7504aa ("xfs: track and serialize in-flight async buffers against unmount")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Tested-by: NLibor Pechacek <lpechacek@suse.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

63db7c81

"Yes, people use FOLL_FORCE ;)" · f511c0b1

由 Linus Torvalds 提交于 5月 30, 2017

This effectively reverts commit 8ee74a91 ("proc: try to remove use
of FOLL_FORCE entirely")

It turns out that people do depend on FOLL_FORCE for the /proc/<pid>/mem
case, and we're talking not just debuggers. Talking to the affected people, the use-cases are:

Keno Fischer:
 "We used these semantics as a hardening mechanism in the julia JIT. By
  opening /proc/self/mem and using these semantics, we could avoid
  needing RWX pages, or a dual mapping approach. We do have fallbacks to
  these other methods (though getting EIO here actually causes an assert
  in released versions - we'll updated that to make sure to take the
  fall back in that case).

  Nevertheless the /proc/self/mem approach was our favored approach
  because it a) Required an attacker to be able to execute syscalls
  which is a taller order than getting memory write and b) didn't double
  the virtual address space requirements (as a dual mapping approach
  would).

  I think in general this feature is very useful for anybody who needs
  to precisely control the execution of some other process. Various
  debuggers (gdb/lldb/rr) certainly fall into that category, but there's
  another class of such processes (wine, various emulators) which may
  want to do that kind of thing.

  Now, I suspect most of these will have the other process under ptrace
  control, so maybe allowing (same_mm || ptraced) would be ok, but at
  least for the sandbox/remote-jit use case, it would be perfectly
  reasonable to not have the jit server be a ptracer"

Robert O'Callahan:
 "We write to readonly code and data mappings via /proc/.../mem in lots
  of different situations, particularly when we're adjusting program
  state during replay to match the recorded execution.

  Like Julia, we can add workarounds, but they could be expensive."

so not only do people use FOLL_FORCE for both reads and writes, but they
use it for both the local mm and remote mm.

With these comments in mind, we likely also cannot add the "are we
actively ptracing" check either, so this keeps the new code organization
and does not do a real revert that would add back the original comment
about "Maybe we should limit FOLL_FORCE to actual ptrace users?"
Reported-by: NKeno Fischer <keno@juliacomputing.com>
Reported-by: NRobert O'Callahan <robert@ocallahan.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f511c0b1

30 5月, 2017 1 次提交

ext4: fix fdatasync(2) after extent manipulation operations · 67a7d5f5

由 Jan Kara 提交于 5月 29, 2017

Currently, extent manipulation operations such as hole punch, range
zeroing, or extent shifting do not record the fact that file data has
changed and thus fdatasync(2) has a work to do. As a result if we crash
e.g. after a punch hole and fdatasync, user can still possibly see the
punched out data after journal replay. Test generic/392 fails due to
these problems.

Fix the problem by properly marking that file data has changed in these
operations.

CC: stable@vger.kernel.org
Fixes: a4bb6b64Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

67a7d5f5

29 5月, 2017 3 次提交

ovl: filter trusted xattr for non-admin · a082c6f6

由 Miklos Szeredi 提交于 5月 29, 2017

Filesystems filter out extended attributes in the "trusted." domain for
unprivlieged callers.

Overlay calls underlying filesystem's method with elevated privs, so need
to do the filtering in overlayfs too.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a082c6f6

ovl: mark upper merge dir with type origin entries "impure" · f3a15685

由 Amir Goldstein 提交于 5月 24, 2017

An upper dir is marked "impure" to let ovl_iterate() know that this
directory may contain non pure upper entries whose d_ino may need to be
read from the origin inode.

We already mark a non-merge dir "impure" when moving a non-pure child
entry inside it, to let ovl_iterate() know not to iterate the non-merge
dir directly.

Mark also a merge dir "impure" when moving a non-pure child entry inside
it and when copying up a child entry inside it.

This can be used to optimize ovl_iterate() to perform a "pure merge" of
upper and lower directories, merging the content of the directories,
without having to read d_ino from origin inodes.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

f3a15685

ocfs2: Use ERR_CAST() to avoid cross-structure cast · 7585d12f

由 Kees Cook 提交于 5月 08, 2017

When trying to propagate an error result, the error return path attempts
to retain the error, but does this with an open cast across very different
types, which the upcoming structure layout randomization plugin flags as
being potentially dangerous in the face of randomization. This is a false
positive, but what this code actually wants to do is use ERR_CAST() to
retain the error value.

Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

7585d12f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功