提交 · 8242c9f35ae77ca21e6cf79827787789a988f44c · openanolis / cloud-kernel

04 5月, 2017 5 次提交

ceph: fix wrong check in ceph_renew_caps() · 8242c9f3

由 Yan, Zheng 提交于 3月 25, 2017

Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8242c9f3

libceph: convert ceph_pagelist.refcnt from atomic_t to refcount_t · 0e1a5ee6

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0e1a5ee6

ceph: convert ceph_cap_snap.nref from atomic_t to refcount_t · 805692d0

由 Elena Reshetova 提交于 3月 03, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

805692d0

ceph: convert ceph_mds_session.s_ref from atomic_t to refcount_t · 3997c01d

由 Elena Reshetova 提交于 3月 03, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3997c01d

libceph, ceph: always advertise all supported features · 74da4a0f

由 Ilya Dryomov 提交于 3月 03, 2017

No reason to hide CephFS-specific features in the rbd case.  Recent
feature bits mix RADOS and CephFS-specific stuff together anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

74da4a0f

28 4月, 2017 1 次提交

statx: correct error handling of NULL pathname · 59372bbf

由 Michael Kerrisk (man-pages) 提交于 4月 27, 2017

The change in commit 1e2f82d1 ("statx: Kill fd-with-NULL-path
support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
statx() is inconsistent.

It results in the error EINVAL for a NULL pathname.  Other system calls
with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.

The solution is simply to remove the EINVAL check.  As I already pointed
out in [1], user_path_at*() and filename_lookup() will handle the NULL
pathname as per the other APIs, to correctly produce the error EFAULT.

[1] https://lkml.org/lkml/2017/4/26/561Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

59372bbf

27 4月, 2017 1 次提交

statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH · 1e2f82d1

由 David Howells 提交于 4月 26, 2017

With the new statx() syscall, the following both allow the attributes of
the file attached to a file descriptor to be retrieved:

	statx(dfd, NULL, 0, ...);

and:

	statx(dfd, "", AT_EMPTY_PATH, ...);

Change the code to reject the first option, though this means copying
the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
non-directory provided path is "".

[ The timing of this isn't wonderful, but applying this now before we
  have statx() in any released kernel, before anybody starts using the
  NULL special case.    - Linus ]

Fixes: a528d35e ("statx: Add a system call to make enhanced file info available")
Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Eric Sandeen <sandeen@sandeen.net>
cc: fstests@vger.kernel.org
cc: linux-api@vger.kernel.org
cc: linux-man@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e2f82d1

26 4月, 2017 4 次提交

nfsd: stricter decoding of write-like NFSv2/v3 ops · 13bf9fbf

由 J. Bruce Fields 提交于 4月 21, 2017

The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer.  This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative.  Add
checks to catch these.
Reported-by: NTuomas Haanpää <thaan@synopsys.com>
Reported-by: NAri Kauppi <ari@synopsys.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

13bf9fbf

nfsd4: minor NFSv2/v3 write decoding cleanup · db44bac4

由 J. Bruce Fields 提交于 4月 25, 2017

Use a couple shortcuts that will simplify a following bugfix.

Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

db44bac4

nfsd: check for oversized NFSv2/v3 arguments · e6838a29

由 J. Bruce Fields 提交于 4月 21, 2017

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply.  This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE).  But a client that sends an incorrectly long reply
can violate those assumptions.  This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended.  That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: NTuomas Haanpää <thaan@synopsys.com>
Reported-by: NAri Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e6838a29

ceph: fix recursion between ceph_set_acl() and __ceph_setattr() · 8179a101

由 Yan, Zheng 提交于 4月 19, 2017

ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().

The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.

Cc: stable@vger.kernel.org # needs backporting for < 4.9
Link: http://tracker.ceph.com/issues/19688Reported-by: NJerry Lee <leisurelysw24@gmail.com>
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NLuis Henriques <lhenriques@suse.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8179a101

20 4月, 2017 1 次提交

nsfs: mark dentry with DCACHE_RCUACCESS · 073c516f

由 Cong Wang 提交于 4月 19, 2017

Andrey reported a use-after-free in __ns_get_path():

  spin_lock include/linux/spinlock.h:299 [inline]
  lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
  __ns_get_path+0x197/0x860 fs/nsfs.c:66
  open_related_ns+0xda/0x200 fs/nsfs.c:143
  sock_ioctl+0x39d/0x440 net/socket.c:1001
  vfs_ioctl fs/ioctl.c:45 [inline]
  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
  SYSC_ioctl fs/ioctl.c:700 [inline]
  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691

We are under rcu read lock protection at that point:

        rcu_read_lock();
        d = atomic_long_read(&ns->stashed);
        if (!d)
                goto slow;
        dentry = (struct dentry *)d;
        if (!lockref_get_not_dead(&dentry->d_lockref))
                goto slow;
        rcu_read_unlock();

but don't use a proper RCU API on the free path, therefore a parallel
__d_free() could free it at the same time.  We need to mark the stashed
dentry with DCACHE_RCUACCESS so that __d_free() will be called after all
readers leave RCU.

Fixes: e149ed2b ("take the targets of /proc/*/ns/* symlinks to separate fs")
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

073c516f

19 4月, 2017 2 次提交

btrfs: qgroup: move noisy underflow warning to debugging build · 338bd52f

由 David Sterba 提交于 4月 18, 2017

The WARN_ON and warning from report_reserved_underflow can become very
noisy and is visible unconditionally although this is namely for
debugging. The patch "btrfs: Add WARN_ON for qgroup reserved underflow"
(18dc22c1) went to 4.11-rc1 and the plan
was to get the fix as well, but this hasn't happened.

CC: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

338bd52f

ubifs: Fix O_TMPFILE corner case in ubifs_link() · 32fe905c

由 Richard Weinberger 提交于 3月 30, 2017

It is perfectly fine to link a tmpfile back using linkat().
Since tmpfiles are created with a link count of 0 they appear
on the orphan list, upon re-linking the inode has to be removed
from the orphan list again.

Ralph faced a filesystem corruption in combination with overlayfs
due to this bug.

Cc: <stable@vger.kernel.org>
Cc: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Reported-by: NRalph Sennhauser <ralph.sennhauser@gmail.com>
Tested-by: NRalph Sennhauser <ralph.sennhauser@gmail.com>
Reported-by: NAmir Goldstein <amir73il@gmail.com>
Fixes: 474b9370 ("ubifs: Implement O_TMPFILE")
Signed-off-by: NRichard Weinberger <richard@nod.at>

32fe905c

18 4月, 2017 3 次提交

cifs: Do not send echoes before Negotiate is complete · 62a6cfdd

由 Sachin Prabhu 提交于 4月 16, 2017

commit 4fcd1813 ("Fix reconnect to not defer smb3 session reconnect
long after socket reconnect") added support for Negotiate requests to
be initiated by echo calls.

To avoid delays in calling echo after a reconnect, I added the patch
introduced by the commit b8c60012 ("Call echo service immediately
after socket reconnect").

This has however caused a regression with cifs shares which do not have
support for echo calls to trigger Negotiate requests. On connections
which need to call Negotiation, the echo calls trigger an error which
triggers a reconnect which in turn triggers another echo call. This
results in a loop which is only broken when an operation is performed on
the cifs share. For an idle share, it can DOS a server.

The patch uses the smb_operation can_echo() for cifs so that it is
called only if connection has been already been setup.

kernel bz: 194531
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Tested-by: NJonathan Liu <net147@gmail.com>
Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

62a6cfdd

fix nfs O_DIRECT advancing iov_iter too much · 85128b2b

由 Al Viro 提交于 4月 13, 2017

It leaves the iterator advanced by the amount of IO it has requested
instead of the amount actually transferred.  Among other things,
that confuses the hell out of generic_file_splice_read().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85128b2b

orangefs_bufmap_copy_from_iovec(): fix EFAULT handling · 890559e3

由 Al Viro 提交于 4月 13, 2017

short copy here should mean instant EFAULT, not "move to the
next page and hope it fails there, this time with nothing
copied"
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

890559e3

16 4月, 2017 2 次提交

orangefs: free superblock when mount fails · 1ec1688c

由 Martin Brandenburg 提交于 4月 14, 2017

Otherwise lockdep says:

[ 1337.483798] ================================================
[ 1337.483999] [ BUG: lock held when returning to user space! ]
[ 1337.484252] 4.11.0-rc6 #19 Not tainted
[ 1337.484423] ------------------------------------------------
[ 1337.484626] mount/14766 is leaving the kernel with locks still held!
[ 1337.484841] 1 lock held by mount/14766:
[ 1337.485017]  #0:  (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520

Caught by xfstests generic/413 which tried to mount with the unsupported
mount option dax.  Then xfstests generic/422 ran sync which deadlocks.
Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
Acked-by: NMike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ec1688c

vfs: don't do RCU lookup of empty pathnames · c0eb027e

由 Linus Torvalds 提交于 4月 02, 2017

Normal pathname lookup doesn't allow empty pathnames, but using
AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you
can trigger an empty pathname lookup.

And not only is the RCU lookup in that case entirely unnecessary
(because we'll obviously immediately finalize the end result), it is
actively wrong.

Why? An empth path is a special case that will return the original
'dirfd' dentry - and that dentry may not actually be RCU-free'd,
resulting in a potential use-after-free if we were to initialize the
path lazily under the RCU read lock and depend on complete_walk()
finalizing the dentry.

Found by syzkaller and KASAN.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0eb027e

14 4月, 2017 2 次提交

hugetlbfs: fix offset overflow in hugetlbfs mmap · 045c7a3f

由 Mike Kravetz 提交于 4月 13, 2017

If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start.  Offset could be a negative value when
represented as a loff_t.  The offset plus length will be used to update
the file size (i_size) which is also a loff_t.

Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.

Found by syzcaller with commit ff8c0c53 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied.  Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.

To reproduce:

   mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);

Resulted in,

  kernel BUG at mm/hugetlb.c:742!
  Call Trace:
   hugetlbfs_evict_inode+0x80/0xa0
   evict+0x24a/0x620
   iput+0x48f/0x8c0
   dentry_unlink_inode+0x31f/0x4d0
   __dentry_kill+0x292/0x5e0
   dput+0x730/0x830
   __fput+0x438/0x720
   ____fput+0x1a/0x20
   task_work_run+0xfe/0x180
   exit_to_usermode_loop+0x133/0x150
   syscall_return_slowpath+0x184/0x1c0
   entry_SYSCALL_64_fastpath+0xab/0xad

Fixes: ff8c0c53 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.comReported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

045c7a3f

thp: fix MADV_DONTNEED vs clear soft dirty race · 5b7abeae

由 Kirill A. Shutemov 提交于 4月 13, 2017

Yet another instance of the same race.

Fix is identical to change_huge_pmd().

See "thp: fix MADV_DONTNEED vs.  numa balancing race" for more details.

Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b7abeae

13 4月, 2017 2 次提交

nfsd: fix oops on unsupported operation · 05b7278d

由 Olga Kornievskaia 提交于 3月 23, 2017

I'm hitting the BUG in nfsd4_max_reply() at fs/nfsd/nfs4proc.c:2495 when
client sends an operation the server doesn't support.

in nfsd4_max_reply() it checks for NULL rsize_bop but a non-supported
operation wouldn't have that set.

Cc: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 2282cd2c "NFSD: Get response size before operation..."
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

05b7278d

CIFS: Fix SMB3 mount without specifying a security mechanism · 67dbea2c

由 Pavel Shilovsky 提交于 4月 12, 2017

Commit ef65aaed ("smb2: Enforce sec= mount option") changed the
behavior of a mount command to enforce a specified security mechanism
during mounting. On another hand according to the spec if SMB3 server
doesn't respond with a security context it implies that it supports
NTLMSSP. The current code doesn't keep it in mind and fails a mount
for such servers if no security mechanism is specified. Fix this by
indicating that a server supports NTLMSSP if a security context isn't
returned during negotiate phase. This allows the code to use NTLMSSP
by default for SMB3 mounts.
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

67dbea2c

12 4月, 2017 4 次提交

Btrfs: fix potential use-after-free for cloned bio · a967efb3

由 Liu Bo 提交于 4月 10, 2017

KASAN reports that there is a use-after-free case of bio in btrfs_map_bio.

If we need to submit IOs to several disks at a time, the original bio
would get cloned and mapped to the destination disk, but we really should
use the original bio instead of a cloned bio to do the sanity check
because cloned bios are likely to be freed by its endio.
Reported-by: NDiego <diegocg@gmail.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a967efb3

Btrfs: fix segmentation fault when doing dio read · 97bf5a55

由 Liu Bo 提交于 4月 07, 2017

Commit 2dabb324 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
introduced this bug during iterating bio pages in dio read's endio hook,
and it could end up with segment fault of the dio reading task.

So the reason is 'if (nr_sectors--)', and it makes the code assume that
there is one more block in the same page, so page offset is increased and
the bio which is created to repair the bad block then has an incorrect
bvec.bv_offset, and a later access of the page content would throw a
segmentation fault.

This also adds ASSERT to check page offset against page size.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

97bf5a55

Btrfs: fix invalid dereference in btrfs_retry_endio · 2e949b0a

由 Liu Bo 提交于 4月 05, 2017

When doing directIO repair, we have this oops:

[ 1458.532816] general protection fault: 0000 [#1] SMP
...
[ 1458.536291] Workqueue: btrfs-endio-repair btrfs_endio_repair_helper [btrfs]
[ 1458.536893] task: ffff88082a42d100 task.stack: ffffc90002b3c000
[ 1458.537499] RIP: 0010:btrfs_retry_endio+0x7e/0x1a0 [btrfs]
...
[ 1458.543261] Call Trace:
[ 1458.543958]  ? rcu_read_lock_sched_held+0xc4/0xd0
[ 1458.544374]  bio_endio+0xed/0x100
[ 1458.544750]  end_workqueue_fn+0x3c/0x40 [btrfs]
[ 1458.545257]  normal_work_helper+0x9f/0x900 [btrfs]
[ 1458.545762]  btrfs_endio_repair_helper+0x12/0x20 [btrfs]
[ 1458.546224]  process_one_work+0x34d/0xb70
[ 1458.546570]  ? process_one_work+0x29e/0xb70
[ 1458.546938]  worker_thread+0x1cf/0x960
[ 1458.547263]  ? process_one_work+0xb70/0xb70
[ 1458.547624]  kthread+0x17d/0x180
[ 1458.547909]  ? kthread_create_on_node+0x70/0x70
[ 1458.548300]  ret_from_fork+0x31/0x40

It turns out that btrfs_retry_endio is trying to get inode from a directIO
page.

This fixes the problem by using the saved inode pointer, done->inode.
btrfs_retry_endio_nocsum has the same problem, and it's fixed as well.

Also cleanup unused @start (which is too trivial for a separate patch).

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2e949b0a

btrfs: drop the nossd flag when remounting with -o ssd · 951e7966

由 Adam Borowski 提交于 3月 31, 2017

The opposite case was already handled right in the very next switch entry.
And also when turning on nossd, drop ssd_spread.
Reported-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: NAdam Borowski <kilobyte@angband.pl>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

951e7966

11 4月, 2017 5 次提交

CIFS: store results of cifs_reopen_file to avoid infinite wait · 1fa839b4

由 Germano Percossi 提交于 4月 07, 2017

This fixes Continuous Availability when errors during
file reopen are encountered.

cifs_user_readv and cifs_user_writev would wait for ever if
results of cifs_reopen_file are not stored and for later inspection.

In fact, results are checked and, in case of errors, a chain
of function calls leading to reads and writes to be scheduled in
a separate thread is skipped.
These threads will wake up the corresponding waiters once reads
and writes are done.

However, given the return value is not stored, when rc is checked
for errors a previous one (always zero) is inspected instead.
This leads to pending reads/writes added to the list, making
cifs_user_readv and cifs_user_writev wait for ever.
Signed-off-by: NGermano Percossi <germano.percossi@citrix.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

1fa839b4

CIFS: remove bad_network_name flag · a0918f1c

由 Germano Percossi 提交于 4月 07, 2017

STATUS_BAD_NETWORK_NAME can be received during node failover,
causing the flag to be set and making the reconnect thread
always unsuccessful, thereafter.

Once the only place where it is set is removed, the remaining
bits are rendered moot.

Removing it does not prevent "mount" from failing when a non
existent share is passed.

What happens when the share really ceases to exist while the
share is mounted is undefined now as much as it was before.
Signed-off-by: NGermano Percossi <germano.percossi@citrix.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

a0918f1c

CIFS: reconnect thread reschedule itself · 18ea4311

由 Germano Percossi 提交于 4月 07, 2017

In case of error, smb2_reconnect_server reschedule itself
with a delay, to avoid being too aggressive.
Signed-off-by: NGermano Percossi <germano.percossi@citrix.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

18ea4311

CIFS: handle guest access errors to Windows shares · 40920c2b

由 Mark Syms 提交于 11月 29, 2016

Commit 1a967d6c ("correctly to
anonymous authentication for the NTLM(v2) authentication") introduces
a regression in handling errors related to attempting a guest
connection to a Windows share which requires authentication. This
should result in a permission denied error but actually causes the
kernel module to enter a never-ending loop trying to follow a DFS
referal which doesn't exist.

The base cause of this is the failure now occurs later in the process
during tree connect and not at the session setup setup and all errors
in tree connect are interpreted as needing to follow the DFS paths
which isn't in this case correct. So, check the returned error against
EACCES and fail if this is returned error.

Feedback from Aurelien:

PS> net user guest /activate:no
PS> mkdir C:\guestshare
PS> icacls C:\guestshare /grant 'Everyone:(OI)(CI)F'
PS> new-smbshare -name guestshare -path C:\guestshare -fullaccess Everyone

I've tested v3.10, v4.4, master, master+your patch using default options
(empty or no user "NU") and user=abc (U).

NT_LOGON_FAILURE in session setup: LF
This is what you seem to have in 3.10.

NT_ACCESS_DENIED in tree connect to the share: AD
This is what you get before your infinite loop.

No infinite DFS loop :(
All these issues result in mount failing very fast with permission denied.

I guess it could be from either the Windows version or the share/folder
ACL. A deeper analysis of the packets might reveal more.

In any case I did not notice any issues for on a basic DFS setup with
the patch so I don't think it introduced any regressions, which is
probably all that matters. It still bothers me a little I couldn't hit
the bug.

I've included kernel output w/ debugging output and network capture of
my tests if anyone want to have a look at it. (master+patch = ml-guestfix).
Signed-off-by: NMark Syms <mark.syms@citrix.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Tested-by: NAurelien Aptel <aaptel@suse.com>
Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

40920c2b

CIFS: Fix null pointer deref during read resp processing · 350be257

由 Pavel Shilovsky 提交于 4月 10, 2017

Currently during receiving a read response mid->resp_buf can be
NULL when it is being passed to cifs_discard_remaining_data() from
cifs_readv_discard(). Fix it by always passing server->smallbuf
instead and initializing mid->resp_buf at the end of read response
processing.
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: NSachin Prabhu <sprabhu@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

350be257

08 4月, 2017 5 次提交

sysfs: be careful of error returns from ops->show() · c8a139d0

由 NeilBrown 提交于 4月 03, 2017

ops->show() can return a negative error code.
Commit 65da3484 ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.

Commit 17d0774f ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.

This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.

I can reproduce it reliably only by putting delay like
	usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
  echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state

The bug can be triggered without the usleep.

Fixes: 65da3484 ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reported-and-tested-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c8a139d0

dax: fix radix tree insertion race · e11f8b7b

由 Ross Zwisler 提交于 4月 07, 2017

While running generic/340 in my test setup I hit the following race.  It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.

Thread 1				Thread 2
--------				--------
dax_iomap_pmd_fault()
  grab_mapping_entry()
    spin_lock_irq()
    get_unlocked_mapping_entry()
    'entry' is NULL, can't call lock_slot()
    spin_unlock_irq()
    radix_tree_preload()
					dax_iomap_pmd_fault()
					  grab_mapping_entry()
					    spin_lock_irq()
					    get_unlocked_mapping_entry()
					    ...
					    lock_slot()
					    spin_unlock_irq()
					  dax_pmd_insert_mapping()
					    <inserts a PMD mapping>
    spin_lock_irq()
    __radix_tree_insert() fails with -EEXIST
    <fall back to 4k fault, and die horribly
     when inserting a 4k entry where a PMD exists>

The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index.  For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.

So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.

This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop.  It used to fail within the first 10 iterations.

Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>    [4.10+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e11f8b7b

userfaultfd: report actual registered features in fdinfo · 045098e9

由 Mike Rapoport 提交于 4月 07, 2017

fdinfo for userfault file descriptor reports UFFD_API_FEATURES. Up
until recently, the UFFD_API_FEATURES was defined as 0, therefore
corresponding field in fdinfo always contained zero. Now, with
introduction of several additional features, UFFD_API_FEATURES is not
longer 0 and it seems better to report actual features requested for the
userfaultfd object described by the fdinfo.

First, the applications that were using userfault will still see zero at
the features field in fdinfo. Next, reporting actual features rather
than available features, gives clear indication of what userfault
features are used by an application.

Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

045098e9

orangefs: move features validation to fix filesystem hang · cefdc26e

由 Martin Brandenburg 提交于 4月 06, 2017

Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).

The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked.  When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two.  While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.

This is only half of the bugfix.  The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns.  Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores.  The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.

Mike Marshall says:
 "I've tested this patch against the fixed userspace part. This patch is
  real important, I hope it can make it into 4.11...

  Here's what happens when the userspace daemon is restarted, without
  the patch:

    =============================================
    [ INFO: possible recursive locking detected ]
    [   4.10.0-00007-ge98bdb30 #1 Not tainted    ]
    ---------------------------------------------
    pvfs2-client-co/29032 is trying to acquire lock:
     (orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
                  but task is already holding lock:
     (orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]

    CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb30 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
    Call Trace:
     __lock_acquire+0x7eb/0x1290
     lock_acquire+0xe8/0x1d0
     mutex_lock_killable_nested+0x6f/0x6e0
     service_operation+0x3c7/0x7b0 [orangefs]
     orangefs_remount+0xea/0x150 [orangefs]
     dispatch_ioctl_command+0x227/0x330 [orangefs]
     orangefs_devreq_ioctl+0x29/0x70 [orangefs]
     do_vfs_ioctl+0xa3/0x6e0
     SyS_ioctl+0x79/0x90"
Signed-off-by: NMartin Brandenburg <martin@omnibond.com>
Acked-by: NMike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cefdc26e

sysctl: add sanity check for proc_douintvec · 1680a386

由 Liping Zhang 提交于 4月 07, 2017

Commit e7d316a0 ("sysctl: handle error writing UINT_MAX to u32
fields") introduced the proc_douintvec helper function, but it forgot to
add the related sanity check when doing register_sysctl_table.  So add
it now.
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1680a386

07 4月, 2017 3 次提交

Reset TreeId to zero on SMB2 TREE_CONNECT · 806a28ef

由 Jan-Marek Glogowski 提交于 2月 20, 2017

Currently the cifs module breaks the CIFS specs on reconnect as
described in http://msdn.microsoft.com/en-us/library/cc246529.aspx:

"TreeId (4 bytes): Uniquely identifies the tree connect for the
command. This MUST be 0 for the SMB2 TREE_CONNECT Request."
Signed-off-by: NJan-Marek Glogowski <glogow@fbihome.de>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Tested-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>

806a28ef

CIFS: Fix build failure with smb2 · 4fa8e504

由 Tobias Regnery 提交于 3月 30, 2017

I saw the following build error during a randconfig build:

fs/cifs/smb2ops.c: In function 'smb2_new_lease_key':
fs/cifs/smb2ops.c:1104:2: error: implicit declaration of function 'generate_random_uuid' [-Werror=implicit-function-declaration]

Explicit include the right header to fix this issue.
Signed-off-by: NTobias Regnery <tobias.regnery@gmail.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

4fa8e504

Introduce cifs_copy_file_range() · 620d8745

由 Sachin Prabhu 提交于 2月 10, 2017

The earlier changes to copy range for cifs unintentionally disabled the more
common form of server side copy.

The patch introduces the file_operations helper cifs_copy_file_range()
which is used by the syscall copy_file_range. The new file operations
helper allows us to perform server side copies for SMB2.0 and 2.1
servers as well as SMB 3.0+ servers which do not support the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE.

The new helper uses the ioctl FSCTL_SRV_COPYCHUNK_WRITE to perform
server side copies. The helper is called by vfs_copy_file_range() only
once an attempt to clone the file using the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE has failed.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: Stable  <stable@vger.kernel.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

620d8745

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功