提交 · 8fd1ab747d2b1ec7ec663ad0b41a32eaa35117a8 · openanolis / cloud-kernel

18 11月, 2017 14 次提交

NFSv4: Fix open create exclusive when the server reboots · 8fd1ab74

由 Trond Myklebust 提交于 11月 06, 2017

If the server that does not implement NFSv4.1 persistent session
semantics reboots while we are performing an exclusive create,
then the return value of NFS4ERR_DELAY when we replay the open
during the grace period causes us to lose the verifier.
When the grace period expires, and we present a new verifier,
the server will then correctly reply NFS4ERR_EXIST.

This commit ensures that we always present the same verifier when
replaying the OPEN.
Reported-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8fd1ab74

NFSv4: Add a tracepoint to document open stateid updates · ad9e02dc

由 Trond Myklebust 提交于 11月 06, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ad9e02dc

NFSv4: Fix OPEN / CLOSE race · c9399f21

由 Trond Myklebust 提交于 11月 06, 2017

Ben Coddington has noted the following race between OPEN and CLOSE
on a single client.

Process 1		Process 2		Server
=========		=========		======

1)  OPEN file
2)			OPEN file
3)						Process OPEN (1) seqid=1
4)						Process OPEN (2) seqid=2
5)						Reply OPEN (2)
6)			Receive reply (2)
7)			new stateid, seqid=2

8)			CLOSE file, using
			stateid w/ seqid=2
9)						Reply OPEN (1)
10(						Process CLOSE (8)
11)						Reply CLOSE (8)
12)						Forget stateid
						file closed

13)			Receive reply (7)
14)			Forget stateid
			file closed.

15) Receive reply (1).
16) New stateid seqid=1
    is really the same
    stateid that was
    closed.

IOW: the reply to the first OPEN is delayed. Since "Process 2" does
not wait before closing the file, and it does not cache the closed
stateid, then when the delayed reply is finally received, it is treated
as setting up a new stateid by the client.

The fix is to ensure that the client processes the OPEN and CLOSE calls
in the same order in which the server processed them.

This commit ensures that we examine the seqid of the stateid
returned by OPEN. If it is a new stateid, we assume the seqid
must be equal to the value 1, and that each state transition
increments the seqid value by 1 (See RFC7530, Section 9.1.4.2,
and RFC5661, Section 8.2.2).

If the tracker sees that an OPEN returns with a seqid that is greater
than the cached seqid + 1, then it bumps a flag to ensure that the
caller waits for the RPCs carrying the missing seqids to complete.

Note that there can still be pathologies where the server crashes before
it can even send us the missing seqids. Since the OPEN call is still
holding a slot when it waits here, that could cause the recovery to
stall forever. To avoid that, we time out after a 5 second wait.
Reported-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c9399f21

NFS: Fix bool initialization/comparison · 6089dd0d

由 Thomas Meyer 提交于 10月 07, 2017

Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6089dd0d

NFS: Avoid RCU usage in tracepoints · 3944369d

由 Anna Schumaker 提交于 11月 01, 2017

There isn't an obvious way to acquire and release the RCU lock during a
tracepoint, so we can't use the rpc_peeraddr2str() function here.
Instead, rely on the client's cl_hostname, which should have similar
enough information without needing an rcu_dereference().
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Cc: stable@vger.kernel.org # v3.12
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3944369d

fs, nfs: convert nfs_client.cl_count from atomic_t to refcount_t · 212bf41d

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs_client.cl_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

212bf41d

fs, nfs: convert nfs_lock_context.count from atomic_t to refcount_t · 2f62b5aa

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs_lock_context.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2f62b5aa

fs, nfs: convert nfs4_lock_state.ls_count from atomic_t to refcount_t · 194bc1f4

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_lock_state.ls_count  is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

194bc1f4

fs, nfs: convert nfs_cache_defer_req.count from atomic_t to refcount_t · 0896cade

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs_cache_defer_req.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0896cade

fs, nfs: convert nfs4_ff_layout_mirror.ref from atomic_t to refcount_t · 81a090b9

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_ff_layout_mirror.ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

81a090b9

fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t · 2b28a7be

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2b28a7be

fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_t · eba6dd69

由 Elena Reshetova 提交于 10月 20, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eba6dd69

fs, nfs: convert nfs4_pnfs_ds.ds_count from atomic_t to refcount_t · a2a5dea7

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_pnfs_ds.ds_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a2a5dea7

NFSv4.1: Fix up replays of interrupted requests · 3be0f80b

由 Trond Myklebust 提交于 10月 19, 2017

If the previous request on a slot was interrupted before it was
processed by the server, then our slot sequence number may be out of whack,
and so we try the next operation using the old sequence number.

The problem with this, is that not all servers check to see that the
client is replaying the same operations as previously when they decide
to go to the replay cache, and so instead of the expected error of
NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could
(if the operations match up) be mistaken by the client for a new reply.

To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op
in order to resync our slot sequence number.

Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com>
[olga.kornievskaia@gmail.com: fix an Oops]
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3be0f80b

17 10月, 2017 4 次提交

NFS: remove special-case revalidate in nfs_opendir() · 1fea73ac

由 NeilBrown 提交于 8月 25, 2017

Commit f5a73672 ("NFS: allow close-to-open cache semantics to
apply to root of NFS filesystem") added a call to
__nfs_revalidate_inode() to nfs_opendir to as the lookup
process wouldn't reliable do this.

Subsequent commit a3fbbde7 ("VFS: we need to set LOOKUP_JUMPED
on mountpoint crossing") make this unnecessary.  So remove the
unnecessary code.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1fea73ac

NFS: revalidate "." etc correctly on "open". · b688741c

由 NeilBrown 提交于 8月 25, 2017

For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.

Since commit ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).

Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.

This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate().  This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.

The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.

Fixes: ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable@vger.kernel.org (3.9+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b688741c

NFS: Don't compare apples to elephants to determine access bits · 1750d929

由 Anna Schumaker 提交于 7月 26, 2017

The NFS_ACCESS_* flags aren't a 1:1 mapping to the MAY_* flags, so
checking for MAY_WHATEVER might have surprising results in
nfs*_proc_access().  Let's simplify this check when determining which
bits to ask for, and do it in a generic place instead of copying code
for each NFS version.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1750d929

NFS: Create NFS_ACCESS_* flags · 3c181827

由 Anna Schumaker 提交于 7月 26, 2017

Passing the NFS v4 flags into the v3 code seems weird to me, even if
they are defined to the same values.  This patch adds in generic flags
to help me feel better
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3c181827

14 10月, 2017 2 次提交

fs/binfmt_misc.c: node could be NULL when evicting inode · 7e866006

由 Eryu Guan 提交于 10月 13, 2017

inode->i_private is assigned by a Node pointer only after registering a
new binary format, so it could be NULL if inode was created by
bm_fill_super() (or iput() was called by the error path in
bm_register_write()), and this could result in NULL pointer dereference
when evicting such an inode.  e.g.  mount binfmt_misc filesystem then
umount it immediately:

  mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc
  umount /proc/sys/fs/binfmt_misc

will result in

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000013
  IP: bm_evict_inode+0x16/0x40 [binfmt_misc]
  ...
  Call Trace:
   evict+0xd3/0x1a0
   iput+0x17d/0x1d0
   dentry_unlink_inode+0xb9/0xf0
   __dentry_kill+0xc7/0x170
   shrink_dentry_list+0x122/0x280
   shrink_dcache_parent+0x39/0x90
   do_one_tree+0x12/0x40
   shrink_dcache_for_umount+0x2d/0x90
   generic_shutdown_super+0x1f/0x120
   kill_litter_super+0x29/0x40
   deactivate_locked_super+0x43/0x70
   deactivate_super+0x45/0x60
   cleanup_mnt+0x3f/0x70
   __cleanup_mnt+0x12/0x20
   task_work_run+0x86/0xa0
   exit_to_usermode_loop+0x6d/0x99
   syscall_return_slowpath+0xba/0xf0
   entry_SYSCALL_64_fastpath+0xa3/0xa

Fix it by making sure Node (e) is not NULL.

Link: http://lkml.kernel.org/r/20171010100642.31786-1-eguan@redhat.com
Fixes: 83f91827 ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()")
Signed-off-by: NEryu Guan <eguan@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e866006

fs/mpage.c: fix mpage_writepage() for pages with buffers · f892760a

由 Matthew Wilcox 提交于 10月 13, 2017

When using FAT on a block device which supports rw_page, we can hit
BUG_ON(!PageLocked(page)) in try_to_free_buffers().  This is because we
call clean_buffers() after unlocking the page we've written.  Introduce
a new clean_page_buffers() which cleans all buffers associated with a
page and call it from within bdev_write_page().

[akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew]
Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reported-by: NToshi Kani <toshi.kani@hpe.com>
Reported-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f892760a

12 10月, 2017 7 次提交

xfs: handle error if xfs_btree_get_bufs fails · 93e8befc

由 Eric Sandeen 提交于 10月 09, 2017

Jason reported that a corrupted filesystem failed to replay
the log with a metadata block out of bounds warning:

XFS (dm-2): _xfs_buf_find: Block out of range: block 0x80270fff8, EOFS 0x9c40000

_xfs_buf_find() and xfs_btree_get_bufs() return NULL if
that happens, and then when xfs_alloc_fix_freelist() calls
xfs_trans_binval() on that NULL bp, we oops with:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8

We don't handle _xfs_buf_find errors very well, every
caller higher up the stack gets to guess at why it failed.
But we should at least handle it somehow, so return
EFSCORRUPTED here.
Reported-by: NJason L Tibbitts III <tibbs@math.uh.edu>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

93e8befc

xfs: reinit btree pointer on attr tree inactivation walk · f35c5e10

由 Brian Foster 提交于 10月 09, 2017

xfs_attr3_root_inactive() walks the attr fork tree to invalidate the
associated blocks. xfs_attr3_node_inactive() recursively descends
from internal blocks to leaf blocks, caching block address values
along the way to revisit parent blocks, locate the next entry and
descend down that branch of the tree.

The code that attempts to reread the parent block is unsafe because
it assumes that the local xfs_da_node_entry pointer remains valid
after an xfs_trans_brelse() and re-read of the parent buffer. Under
heavy memory pressure, it is possible that the buffer has been
reclaimed and reallocated by the time the parent block is reread.
This means that 'btree' can point to an invalid memory address, lead
to a random/garbage value for child_fsb and cause the subsequent
read of the attr fork to go off the rails and return a NULL buffer
for an attr fork offset that is most likely not allocated.

Note that this problem can be manufactured by setting
XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers,
creating a file with a multi-level attr fork and removing it to
trigger inactivation.

To address this problem, reinit the node/btree pointers to the
parent buffer after it has been re-read. This ensures btree points
to a valid record and allows the walk to proceed.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f35c5e10

xfs: Fix bool initialization/comparison · 749f24f3

由 Thomas Meyer 提交于 10月 09, 2017

Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

749f24f3

xfs: don't change inode mode if ACL update fails · 67f2ffe3

由 Dave Chinner 提交于 10月 09, 2017

If we get ENOSPC half way through setting the ACL, the inode mode
can still be changed even though the ACL does not exist. Reorder the
operation to only change the mode of the inode if the ACL is set
correctly.

Whilst this does not fix the problem with crash consistency (that requires
attribute addition to be a deferred op) it does prevent ENOSPC and other
non-fatal errors setting an xattr to be handled sanely.

This fixes xfstests generic/449.
Signed-Off-By: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

67f2ffe3

xfs: move more RT specific code under CONFIG_XFS_RT · bb9c2e54

由 Dave Chinner 提交于 10月 09, 2017

Various utility functions and interfaces that iterate internal
devices try to reference the realtime device even when RT support is
not compiled into the kernel.

Make sure this code is excluded from the CONFIG_XFS_RT=n build,
and where appropriate stub functions to return fatal errors if
they ever get called when RT support is not present.
Signed-Off-By: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

bb9c2e54

xfs: Don't log uninitialised fields in inode structures · 20413e37

由 Dave Chinner 提交于 10月 09, 2017

Prevent kmemcheck from throwing warnings about reading uninitialised
memory when formatting inodes into the incore log buffer. There are
several issues here - we don't always log all the fields in the
inode log format item, and we never log the inode the
di_next_unlinked field.

In the case of the inode log format item, this is exacerbated
by the old xfs_inode_log_format structure padding issue. Hence make
the padded, 64 bit aligned version of the structure the one we always
use for formatting the log and get rid of the 64 bit variant. This
means we'll always log the 64-bit version and so recovery only needs
to convert from the unpadded 32 bit version from older 32 bit
kernels.
Signed-Off-By: NDave Chinner <dchinner@redhat.com>
Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

20413e37

9p: set page uptodate when required in write_end() · 56ae414e

由 Alexander Levin 提交于 4月 10, 2017

Commit 77469c3f prevented setting the page as uptodate when we wrote
the right amount of data, fix that.

Fixes: 77469c3f ("9p: saner ->write_end() on failing copy into non-uptodate page")
Reviewed-by: NJan Kara <jack@suse.com>
Signed-off-by: NAlexander Levin <alexander.levin@verizon.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

56ae414e

11 10月, 2017 1 次提交

direct-io: Prevent NULL pointer access in submit_page_section · 899f0429

由 Andreas Gruenbacher 提交于 10月 09, 2017

In the code added to function submit_page_section by commit b1058b98,
sdio->bio can currently be NULL when calling dio_bio_submit.  This then
leads to a NULL pointer access in dio_bio_submit, so check for a NULL
bio in submit_page_section before trying to submit it instead.

Fixes xfstest generic/250 on gfs2.

Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

899f0429

10 10月, 2017 1 次提交

quota: Generate warnings for DQUOT_SPACE_NOFAIL allocations · ac3d7939

由 Jan Kara 提交于 10月 10, 2017

Eryu has reported that since commit 7b9ca4c6 "quota: Reduce
contention on dq_data_lock" test generic/233 occasionally fails. This is
caused by the fact that since that commit we don't generate warning and
set grace time for quota allocations that have DQUOT_SPACE_NOFAIL set
(these are for example some metadata allocations in ext4). We need these
allocations to behave regularly wrt warning generation and grace time
setting so fix the code to return to the original behavior.
Reported-and-tested-by: NEryu Guan <eguan@redhat.com>
CC: stable@vger.kernel.org
Fixes: 7b9ca4c6Signed-off-by: NJan Kara <jack@suse.cz>

ac3d7939

06 10月, 2017 1 次提交

nfsd4: define nfsd4_secinfo_no_name_release() · ec572b9e

由 Eryu Guan 提交于 9月 29, 2017

Commit 34b1744c ("nfsd4: define ->op_release for compound ops")
defined a couple ->op_release functions and run them if necessary.

But there's a problem with that is that it reused
nfsd4_secinfo_release() as the op_release of OP_SECINFO_NO_NAME, and
caused a leak on struct nfsd4_secinfo_no_name in
nfsd4_encode_secinfo_no_name(), because there's no .si_exp field in
struct nfsd4_secinfo_no_name.

I found this because I was unable to umount an ext4 partition after
exporting it via NFS & run fsstress on the nfs mount. A simplified
reproducer would be:

 # mount a local-fs device at /mnt/test, and export it via NFS with
 # fsid=0 export option (this is required)
 mount /dev/sda5 /mnt/test
 echo "/mnt/test *(rw,no_root_squash,fsid=0)" >> /etc/exports
 service nfs restart

 # locally mount the nfs export with all default, note that I have
 # nfsv4.1 configured as the default nfs version, because of the
 # fsid export option, v4 mount would fail and fall back to v3
 mount localhost:/mnt/test /mnt/nfs

 # try to umount the underlying device, but got EBUSY
 umount /mnt/nfs
 service nfs stop
 umount /mnt/test <=== EBUSY here

Fixed it by defining a separate nfsd4_secinfo_no_name_release()
function as the op_release method of OP_SECINFO_NO_NAME that
releases the correct nfsd4_secinfo_no_name structure.

Fixes: 34b1744c ("nfsd4: define ->op_release for compound ops")
Signed-off-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ec572b9e

05 10月, 2017 7 次提交

ovl: fix regression caused by exclusive upper/work dir protection · 85fdee1e

由 Amir Goldstein 提交于 9月 29, 2017

Enforcing exclusive ownership on upper/work dirs caused a docker
regression: https://github.com/moby/moby/issues/34672.

Euan spotted the regression and pointed to the offending commit.
Vivek has brought the regression to my attention and provided this
reproducer:

Terminal 1:

  mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
        merged/

Terminal 2:

  unshare -m

Terminal 1:

  umount merged
  mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
        merged/
  mount: /root/overlay-testing/merged: none already mounted or mount point
         busy

To fix the regression, I replaced the error with an alarming warning.
With index feature enabled, mount does fail, but logs a suggestion to
override exclusive dir protection by disabling index.
Note that index=off mount does take the inuse locks, so a concurrent
index=off will issue the warning and a concurrent index=on mount will fail.

Documentation was updated to reflect this change.

Fixes: 2cac0c00 ("ovl: get exclusive ownership on upper/work dirs")
Cc: <stable@vger.kernel.org> # v4.13
Reported-by: NEuan Kemp <euank@euank.com>
Reported-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

85fdee1e

ovl: fix missing unlock_rename() in ovl_do_copy_up() · 5820dc08

由 Amir Goldstein 提交于 9月 25, 2017

Use the ovl_lock_rename_workdir() helper which requires
unlock_rename() only on lock success.

Fixes: ("fd210b7d ovl: move copy up lock out")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5820dc08

ovl: fix dentry leak in ovl_indexdir_cleanup() · dc7ab677

由 Amir Goldstein 提交于 9月 24, 2017

index dentry was not released when breaking out of the loop
due to index verification error.

Fixes: 415543d5 ("ovl: cleanup bad and stale index entries on mount")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

dc7ab677

ovl: fix dput() of ERR_PTR in ovl_cleanup_index() · 9f4ec904

由 Amir Goldstein 提交于 9月 24, 2017

Fixes: caf70cb2 ("ovl: cleanup orphan index entries")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

9f4ec904

ovl: fix error value printed in ovl_lookup_index() · e0082a0f

由 Amir Goldstein 提交于 9月 24, 2017

Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e0082a0f

ovl: fix may_write_real() for overlayfs directories · 954c736f

由 Amir Goldstein 提交于 9月 18, 2017

Overlayfs directory file_inode() is the overlay inode whether the real
inode is upper or lower.

This fixes a regression in xfstest generic/158.

Fixes: 7c6893e3 ("ovl: don't allow writing ioctl on lower layer")
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

954c736f

NFSv4/pnfs: Fix an infinite layoutget loop · e8fa33a6

由 Trond Myklebust 提交于 10月 04, 2017

Since we can now use a lock stateid or a delegation stateid, that
differs from the context stateid, we need to change the test in
nfs4_layoutget_handle_exception() to take this into account.

This fixes an infinite layoutget loop in the NFS client whereby
it keeps retrying the initial layoutget using the same broken
stateid.

Fixes: 70d2f7b1 ("pNFS: Use the standard I/O stateid when...")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e8fa33a6

04 10月, 2017 3 次提交

Btrfs: fix overlap of fs_info::flags values · 69ad5976

由 Tsutomu Itoh 提交于 10月 04, 2017

Because the values of BTRFS_FS_EXCL_OP and BTRFS_FS_QUOTA_OVERRIDE overlap,
we should change the value.

First, BTRFS_FS_EXCL_OP was set to 14.

  commit 171938e5 ("btrfs: track exclusive filesystem operation in flags")

Next, the value of BTRFS_FS_QUOTA_OVERRIDE was set to 14.

  commit f29efe29 ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE")

As a result, the value 14 overlapped, by accident.
This problem is solved by defining the value of BTRFS_FS_EXCL_OP as 16,
the flags are internal.

Fixes: f29efe29 ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE")
CC: stable@vger.kernel.org # 4.13+
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minimize the change, update only BTRFS_FS_EXCL_OP ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

69ad5976

btrfs: avoid overflow when sector_t is 32 bit · 2d8ce70a

由 Goffredo Baroncelli 提交于 10月 03, 2017

Jean-Denis Girard noticed commit c821e7f3 "pass bytes to
btrfs_bio_alloc" (https://patchwork.kernel.org/patch/9763081/)
introduces a regression on 32 bit machines.
When CONFIG_LBDAF is _not_ defined (CONFIG_LBDAF == Support for large
(2TB+) block devices and files) sector_t is 32 bit on 32bit machines.

In the function submit_extent_page, 'sector' (which is sector_t type) is
multiplied by 512 to convert it from sectors to bytes, leading to an
overflow when the disk is bigger than 4GB (!).

I added a cast to u64 to avoid overflow.

Fixes: c821e7f3 ("btrfs: pass bytes to btrfs_bio_alloc")
CC: stable@vger.kernel.org # 4.13+
Signed-off-by: NGoffredo Baroncelli <kreijack@inwind.it>
Tested-by: NJean-Denis Girard <jd.girard@sysnux.pf>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2d8ce70a

lsm: fix smack_inode_removexattr and xattr_getsecurity memleak · 57e7ba04

由 Casey Schaufler 提交于 9月 19, 2017

security_inode_getsecurity() provides the text string value
of a security attribute. It does not provide a "secctx".
The code in xattr_getsecurity() that calls security_inode_getsecurity()
and then calls security_release_secctx() happened to work because
SElinux and Smack treat the attribute and the secctx the same way.
It fails for cap_inode_getsecurity(), because that module has no
secctx that ever needs releasing. It turns out that Smack is the
one that's doing things wrong by not allocating memory when instructed
to do so by the "alloc" parameter.

The fix is simple enough. Change the security_release_secctx() to
kfree() because it isn't a secctx being returned by
security_inode_getsecurity(). Change Smack to allocate the string when
told to do so.

Note: this also fixes memory leaks for LSMs which implement
inode_getsecurity but not release_secctx, such as capabilities.
Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: stable@vger.kernel.org
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

57e7ba04

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功