提交 · efb0ca765ac6f4985b57ef215e8d55e746b083f4 · openanolis / cloud-kernel

07 7月, 2017 2 次提交

ceph: update the 'approaching max_size' code · efb0ca76

由 Yan, Zheng 提交于 5月 22, 2017

The old 'approaching max_size' code expects MDS set max_size to
'2 * reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

efb0ca76

ceph: re-request max size after importing caps · 84eea8c7

由 Yan, Zheng 提交于 5月 16, 2017

The 'wanted max size' could be sent to inode's old auth mds, re-send
it to inode's new auth mds if necessary. Otherwise write syscall may
hang.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

84eea8c7

29 6月, 2017 1 次提交

block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5

由 Jens Axboe 提交于 6月 28, 2017

Wen reports significant memory leaks with DIF and O_DIRECT:

"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.

/proc/meminfo | grep SUnreclaim...

SUnreclaim:      6752128 kB
SUnreclaim:      6874880 kB
SUnreclaim:      7238080 kB
....
SUnreclaim:     22307264 kB
SUnreclaim:     22485888 kB
SUnreclaim:     22720256 kB

When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."

Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.

Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.

Fixes: 542ff7bf ("block: new direct I/O implementation")
Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ae3b3f5

28 6月, 2017 6 次提交

ovl: don't set origin on broken lower hardlink · fbaf94ee

由 Miklos Szeredi 提交于 6月 28, 2017

When copying up a file that has multiple hard links we need to break any
association with the origin file.  This makes copy-up be essentially an
atomic replace.

The new file has nothing to do with the old one (except having the same
data and metadata initially), so don't set the overlay.origin attribute.

We can relax this in the future when we are able to index upper object by
origin.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Fixes: 3a1e819b ("ovl: store file handle of lower inode on copy up")

fbaf94ee

ovl: copy-up: don't unlock between lookup and link · e85f82ff

由 Miklos Szeredi 提交于 6月 28, 2017

Nothing prevents mischief on upper layer while we are busy copying up the
data.

Move the lookup right before the looked up dentry is actually used.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Fixes: 01ad3eb8 ("ovl: concurrent copy up of regular files")
Cc: <stable@vger.kernel.org> # v4.11

e85f82ff

NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete() · 2e31b4cb

由 Trond Myklebust 提交于 6月 27, 2017

The current code works only for the case where we have exactly one slot,
which is no longer true.
nfs4_free_slot() will automatically declare the callback channel to be
drained when all slots have been returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2e31b4cb

Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" · d9f29500

由 Benjamin Coddington 提交于 6月 16, 2017

This reverts commit 920b4530 which could
call d_move() without holding the directory's i_mutex, and reverts commit
d4ea7e3c "NFS: Fix old dentry rehash after
move", which was a follow-up fix.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Fixes: 920b4530 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
Cc: stable@vger.kernel.org # v4.10+
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d9f29500

NFSv4.1: Fix a race in nfs4_proc_layoutget · bd171930

由 Trond Myklebust 提交于 6月 27, 2017

If the task calling layoutget is signalled, then it is possible for the
calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race,
in which case we leak a slot.
The fix is to move the call to nfs4_sequence_free_slot() into the
nfs4_layoutget_release() so that it gets called at task teardown time.

Fixes: 2e80dbe7 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bd171930

T
NFS: Trunking detection should handle ERESTARTSYS/EINTR · 898fc11b
由 Trond Myklebust 提交于 6月 21, 2017
```
Currently, it will return EIO in those cases.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
898fc11b

24 6月, 2017 4 次提交

fs/exec.c: account for argv/envp pointers · 98da7d08

由 Kees Cook 提交于 6月 23, 2017

When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included.  This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.

For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).

The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely.  Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).

[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea3 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Qualys Security Advisory <qsa@qualys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98da7d08

ocfs2: fix deadlock caused by recursive locking in xattr · 8818efaa

由 Eric Ren 提交于 6月 23, 2017

Another deadlock path caused by recursive locking is reported.  This
kind of issue was introduced since commit 743b5f14 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
fixed by commit b891fa50 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points").  Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this.  On node1, cp a large file from
home directory to ocfs2 mountpoint.  While on node2, run
setfacl/getfacl.  Both nodes will hang up there.  The backtraces:

On node1:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_write_begin+0x43/0x1a0 [ocfs2]
  generic_perform_write+0xa9/0x180
  __generic_file_write_iter+0x1aa/0x1d0
  ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
  __vfs_write+0xc3/0x130
  vfs_write+0xb1/0x1a0
  SyS_write+0x46/0xa0

On node2:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
  ocfs2_set_acl+0x22d/0x260 [ocfs2]
  ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
  set_posix_acl+0x75/0xb0
  posix_acl_xattr_set+0x49/0xa0
  __vfs_setxattr+0x69/0x80
  __vfs_setxattr_noperm+0x72/0x1a0
  vfs_setxattr+0xa7/0xb0
  setxattr+0x12d/0x190
  path_setxattr+0x9f/0xb0
  SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8 ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: NEric Ren <zren@suse.com>
Reported-by: NThomas Voegtle <tv@lio96.de>
Tested-by: NThomas Voegtle <tv@lio96.de>
Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8818efaa

fs/dax.c: fix inefficiency in dax_writeback_mapping_range() · 1eb643d0

由 Jan Kara 提交于 6月 23, 2017

dax_writeback_mapping_range() fails to update iteration index when
searching radix tree for entries needing cache flushing.  Thus each
pagevec worth of entries is searched starting from the start which is
inefficient and prone to livelocks.  Update index properly.

Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz
Fixes: 9973c98e ("dax: add support for fsync/sync")
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1eb643d0

autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL · 9fa4eb8e

由 NeilBrown 提交于 6月 23, 2017

If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl,
autofs4_d_automount() will return

   ERR_PTR(status)

with that status to follow_automount(), which will then dereference an
invalid pointer.

So treat a positive status the same as zero, and map to ENOENT.

See comment in systemd src/core/automount.c::automount_send_ready().

Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.nameSigned-off-by: NNeilBrown <neilb@suse.com>
Cc: Ian Kent <raven@themaw.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9fa4eb8e

22 6月, 2017 1 次提交

xfs: don't allow bmap on rt files · eb5e248d

由 Darrick J. Wong 提交于 6月 21, 2017

bmap returns a dumb LBA address but not the block device that goes with
that LBA. Swapfiles don't care about this and will blindly assume that
the data volume is the correct blockdev, which is totally bogus for
files on the rt subvolume. This results in the swap code doing IOs to
arbitrary locations on the data device(!) if the passed in mapping is a
realtime file, so just turn off bmap for rt files.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

eb5e248d

21 6月, 2017 5 次提交

CIFS: Fix some return values in case of error in 'crypt_message' · 517a6e43

由 Christophe Jaillet 提交于 6月 11, 2017

'rc' is known to be 0 at this point. So if 'init_sg' or 'kzalloc' fails, we
should return -ENOMEM instead.

Also remove a useless 'rc' in a debug message as it is meaningless here.

Fixes: 026e93dc ("CIFS: Encrypt SMB3 requests before sending")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>

517a6e43

cifs: remove redundant return in cifs_creation_time_get · e125f528

由 Colin Ian King 提交于 6月 07, 2017

There is a redundant return in function cifs_creation_time_get
that appears to be old vestigial code than can be removed. So
remove it.

Detected by CoverityScan, CID#1361924 ("Structurally dead code")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

e125f528

CIFS: Improve readdir verbosity · dcd87838

由 Pavel Shilovsky 提交于 6月 06, 2017

Downgrade the loglevel for SMB2 to prevent filling the log
with messages if e.g. readdir was interrupted. Also make SMB2
and SMB1 codepaths do the same logging during readdir.
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>

dcd87838

CIFS: check if pages is null rather than bv for a failed allocation · ecf3411a

由 Colin Ian King 提交于 5月 17, 2017

pages is being allocated however a null check on bv is being used
to see if the allocation failed. Fix this by checking if pages is
null.

Detected by CoverityScan, CID#1432974 ("Logically dead code")

Fixes: ccf7f408 ("CIFS: Add asynchronous context to support kernel AIO")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

ecf3411a

CIFS: Set ->should_dirty in cifs_user_readv() · 8a7b0d8e

由 Dan Carpenter 提交于 5月 05, 2017

The current code causes a static checker warning because ITER_IOVEC is
zero so the condition is never true.

Fixes: 6685c5e2 ("CIFS: Add asynchronous read support through kernel AIO")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

8a7b0d8e

19 6月, 2017 1 次提交

mm: larger stack guard gap, between vmas · 1be7107f

由 Hugh Dickins 提交于 6月 19, 2017

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: NOleg Nesterov <oleg@redhat.com>
Original-patch-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1be7107f

18 6月, 2017 3 次提交

ufs: fix the logics for tail relocation · 77e9ce32

由 Al Viro 提交于 6月 17, 2017

* original hysteresis loop got broken by typo back in 2002; now
it never switches out of OPTTIME state.  Fixed.
* critical levels for switching from OPTTIME to OPTSPACE and back
ought to be calculated once, at mount time.
* we should use mul_u64_u32_div() for those calculations, now that
->s_dsize is 64bit.
* to quote Kirk McKusick (in 1995 FreeBSD commit message):
    The threshold for switching from time-space and space-time is too small
    when minfree is 5%...so make it stay at space in this case.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77e9ce32

A
ufs_iget(): fail with -ESTALE on deleted inode · c0ef65d2
由 Al Viro 提交于 6月 16, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c0ef65d2
A
fix signedness of timestamps on ufs1 · 23ac7cba
由 Al Viro 提交于 6月 16, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
23ac7cba

17 6月, 2017 1 次提交

userfaultfd: shmem: handle coredumping in handle_userfault() · 64c2b203

由 Andrea Arcangeli 提交于 6月 16, 2017

Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to
__get_user_pages().

shmem as opposed has no special FOLL_DUMP handling there so
handle_mm_fault() is invoked without mmap_sem and ends up calling
handle_userfault() that isn't expecting to be invoked without mmap_sem
held.

This makes handle_userfault() fail immediately if invoked through
shmem_vm_ops->fault during coredumping and solves the problem.

The side effect is a BUG_ON with no lock held triggered by the
coredumping process which exits.  Only 4.11 is affected, pre-4.11 anon
memory holes are skipped in __get_user_pages by checking FOLL_DUMP
explicitly against empty pagetables (mm/gup.c:no_page_table()).

It's zero cost as we already had a check for current->flags to prevent
futex to trigger userfaults during exit (PF_EXITING).

Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Reported-by: N"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: <stable@vger.kernel.org>	[4.11+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

64c2b203

16 6月, 2017 1 次提交

fs: pass on flags in compat_writev · 20223f0f

由 Christoph Hellwig 提交于 6月 16, 2017

Fixes: 793b80ef ("vfs: pass a flags argument to vfs_readv/vfs_writev")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

20223f0f

15 6月, 2017 13 次提交

fs: don't forget to put old mntns in mntns_install · 4068367c

由 Andrei Vagin 提交于 6月 08, 2017

Fixes: 4f757f3c ("make sure that mntns_install() doesn't end up with referral for root")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4068367c

Hang/soft lockup in d_invalidate with simultaneous calls · 81be24d2

由 Al Viro 提交于 6月 03, 2017

It's not hard to trigger a bunch of d_invalidate() on the same
dentry in parallel.  They end up fighting each other - any
dentry picked for removal by one will be skipped by the rest
and we'll go for the next iteration through the entire
subtree, even if everything is being skipped.  Morevoer, we
immediately go back to scanning the subtree.  The only thing
we really need is to dissolve all mounts in the subtree and
as soon as we've nothing left to do, we can just unhash the
dentry and bugger off.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

81be24d2

ufs_truncate_blocks(): fix the case when size is in the last direct block · a8fad984

由 Al Viro 提交于 6月 15, 2017

The logics when deciding whether we need to do anything with direct blocks
is broken when new size is within the last direct block. It's better to
find the path to the last byte _not_ to be removed and use that instead
of the path to the beginning of the first block to be freed...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8fad984

ufs: more deadlock prevention on tail unpacking · 289dec5b

由 Al Viro 提交于 6月 15, 2017

->s_lock is not needed for ufs_change_blocknr()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

289dec5b

ufs: avoid grabbing ->truncate_mutex if possible · 09bf4f5b

由 Al Viro 提交于 6月 15, 2017

tail unpacking is done in a wrong place; the deadlocks galore
is best dealt with by doing that in ->write_iter() (and switching
to iomap, while we are at it), but that's rather painful to
backport.  The trouble comes from grabbing pages that cover
the beginning of tail from inside of ufs_new_fragments(); ongoing
pageout of any of those is going to deadlock on ->truncate_mutex
with process that got around to extending the tail holding that
and waiting for page to get unlocked, while ->writepage() on
that page is waiting on ->truncate_mutex.

The thing is, we don't need ->truncate_mutex when the fragment
we are trying to map is within the tail - the damn thing is
allocated (tail can't contain holes).

Let's do a plain lookup and if the fragment is present, we can
just pretend that we'd won the race in almost all cases.  The
only exception is a fragment between the end of tail and the
end of block containing tail.

Protect ->i_lastfrag with ->meta_lock - read_seqlock_excl() is
sufficient.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

09bf4f5b

ufs_get_locked_page(): make sure we have buffer_heads · 267309f3

由 Al Viro 提交于 6月 14, 2017

callers rely upon that, but find_lock_page() racing with attempt of
page eviction by memory pressure might have left us with
	* try_to_free_buffers() successfully done
	* __remove_mapping() failed, leaving the page in our mapping
	* find_lock_page() returning an uptodate page with no
buffer_heads attached.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

267309f3

ufs: fix s_size/s_dsize users · c596961d

由 Al Viro 提交于 6月 14, 2017

For UFS2 we need 64bit variants; we even store them in uspi, but
use 32bit ones instead.  One wrinkle is in handling of reserved
space - recalculating it every time had been stupid all along, but
now it would become really ugly.  Just calculate it once...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c596961d

ufs: fix reserved blocks check · b451cec4

由 Al Viro 提交于 6月 14, 2017

a) honour ->s_minfree; don't just go with default (5)
b) don't bother with capability checks until we know we'll need them
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b451cec4

ufs: make ufs_freespace() return signed · fffd70f5

由 Al Viro 提交于 6月 14, 2017

as it is, checking that its return value is <= 0 is useless and
that's how it's being used.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fffd70f5

ufs: fix logics in "ufs: make fsck -f happy" · 96ecff14

由 Al Viro 提交于 6月 14, 2017

Storing stats _only_ at new locations is wrong for UFS1; old
locations should always be kept updated.  The check for "has
been converted to use of new locations" is also wrong - it
should be "->fs_maxbsize is equal to ->fs_bsize".
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96ecff14

ceph: unify inode i_ctime update · 4ca2fea6

由 Yan, Zheng 提交于 6月 01, 2017

Current __ceph_setattr() can set inode's i_ctime to current_time(),
req->r_stamp or attr->ia_ctime. These time stamps may have minor
differences. It may cause potential problem.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4ca2fea6

ceph: use current_kernel_time() to get request time stamp · 56199016

由 Yan, Zheng 提交于 6月 01, 2017

ceph uses ktime_get_real_ts() to get request time stamp. In most
other cases, current_kernel_time() is used to get time stamp for
filesystem operations (called by current_time()).

There is granularity difference between ktime_get_real_ts() and
current_kernel_time(). The later one can be up to one jiffy behind
the former one. This can causes inode's ctime to go back.
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

56199016

ceph: check i_nlink while converting a file handle to dentry · 03f21904

由 Luis Henriques 提交于 5月 17, 2017

Converting a file handle to a dentry can be done call after the inode
unlink.  This means that __fh_to_dentry() requires an extra check to
verify the number of links is not 0.

The issue can be easily reproduced using xfstest generic/426, which does
something like:

    name_to_handle_at(&fh)
    echo 3 > /proc/sys/vm/drop_caches
    unlink()
    open_by_handle_at(&fh)

The call to open_by_handle_at() should fail, as the file doesn't exist
anymore.

Link: http://tracker.ceph.com/issues/19958Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

03f21904

12 6月, 2017 2 次提交

configfs: Introduce config_item_get_unless_zero() · 19e72d3a

由 Bart Van Assche 提交于 2月 09, 2017

Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
[hch: minor style tweak]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

19e72d3a

configfs: Fix race between create_link and configfs_rmdir · ba80aa90

由 Nicholas Bellinger 提交于 6月 08, 2017

This patch closes a long standing race in configfs between
the creation of a new symlink in create_link(), while the
symlink target's config_item is being concurrently removed
via configfs_rmdir().

This can happen because the symlink target's reference
is obtained by config_item_get() in create_link() before
the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep()
during configfs_rmdir() shutdown is actually checked..

This originally manifested itself on ppc64 on v4.8.y under
heavy load using ibmvscsi target ports with Novalink API:

[ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
[ 7879.893760] ------------[ cut here ]------------
[ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
[ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O 4.8.17-customv2.22 #12
[ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
[ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
[ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G O     (4.8.17-customv2.22)
[ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
[ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
                GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
                GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
                GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
                GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
                GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
                GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
                GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
                GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
[ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
[ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893842] Call Trace:
[ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
[ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
[ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
[ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
[ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
[ 7879.893856] Instruction dump:
[ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
[ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
[ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---

To close this race, go ahead and obtain the symlink's target
config_item reference only after the existing CONFIGFS_USET_DROPPING
check succeeds.

This way, if configfs_rmdir() wins create_link() will return -ENONET,
and if create_link() wins configfs_rmdir() will return -EBUSY.
Reported-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org

ba80aa90

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功