提交 · 2b35e66e01f7ac24a8f4088e6d5594c146b0067b · openanolis / cloud-kernel

22 3月, 2019 4 次提交

netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs · 2b35e66e

由 Jann Horn 提交于 2月 06, 2019

commit c4c07b4d6fa1f11880eab8e076d3d060ef3f55fc upstream.

The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.

(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)

Fixes: cc2d5863 ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2b35e66e

exec: Fix mem leak in kernel_read_file · 2b8f7782

由 YueHaibing 提交于 2月 19, 2019

commit f612acfae86af7ecad754ae6a46019be9da05b8e upstream.

syzkaller report this:
BUG: memory leak
unreferenced object 0xffffc9000488d000 (size 9195520):
  comm "syz-executor.0", pid 2752, jiffies 4294787496 (age 18.757s)
  hex dump (first 32 bytes):
    ff ff ff ff ff ff ff ff a8 00 00 00 01 00 00 00  ................
    02 00 00 00 00 00 00 00 80 a1 7a c1 ff ff ff ff  ..........z.....
  backtrace:
    [<000000000863775c>] __vmalloc_node mm/vmalloc.c:1795 [inline]
    [<000000000863775c>] __vmalloc_node_flags mm/vmalloc.c:1809 [inline]
    [<000000000863775c>] vmalloc+0x8c/0xb0 mm/vmalloc.c:1831
    [<000000003f668111>] kernel_read_file+0x58f/0x7d0 fs/exec.c:924
    [<000000002385813f>] kernel_read_file_from_fd+0x49/0x80 fs/exec.c:993
    [<0000000011953ff1>] __do_sys_finit_module+0x13b/0x2a0 kernel/module.c:3895
    [<000000006f58491f>] do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    [<00000000ee78baf4>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<00000000241f889b>] 0xffffffffffffffff

It should goto 'out_free' lable to free allocated buf while kernel_read
fails.

Fixes: 39d637af ("vfs: forbid write access when reading a file into memory")
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Thibaut Sautereau <thibaut@sautereau.fr>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2b8f7782

sctp: use memdup_user instead of vmemdup_user · 5dc16ac5

由 Xin Long 提交于 3月 20, 2019

commit ef82bcfa671b9a635bab5fa669005663d8b177c5 upstream.

In sctp_setsockopt_bindx()/__sctp_setsockopt_connectx(), it allocates
memory with addrs_size which is passed from userspace. We used flag
GFP_USER to put some more restrictions on it in Commit cacc0621
("sctp: use GFP_USER for user-controlled kmalloc").

However, since Commit c981f254 ("sctp: use vmemdup_user() rather
than badly open-coding memdup_user()"), vmemdup_user() has been used,
which doesn't check GFP_USER flag when goes to vmalloc_*(). So when
addrs_size is a huge value, it could exhaust memory and even trigger
oom killer.

This patch is to use memdup_user() instead, in which GFP_USER would
work to limit the memory allocation with a huge addrs_size.

Note we can't fix it by limiting 'addrs_size', as there's no demand
for it from RFC.

Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

5dc16ac5

mm: enforce min addr even if capable() in expand_downwards() · bd4f9261

由 Jann Horn 提交于 2月 27, 2019

commit 0a1d52994d440e21def1c2174932410b4f2a98a1 upstream.

security_mmap_addr() does a capability check with current_cred(), but
we can reach this code from contexts like a VFS write handler where
current_cred() must not be used.

This can be abused on systems without SMAP to make NULL pointer
dereferences exploitable again.

Fixes: 8869477a ("security: protect from stack expansion into low vm addresses")
Cc: stable@kernel.org
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

bd4f9261

15 3月, 2019 3 次提交

net: kernel hookers service for toa module · 33576e37

由 George Zhang 提交于 3月 15, 2019

LVS fullnat will replace network traffic's source ip with its local ip,
and thus the backend servers cannot obtain the real client ip.

To solve this, LVS has introduced the tcp option address (TOA) to store
the essential ip address information in the last tcp ack packet of the
3-way handshake, and the backend servers need to retrieve it from the
packet header.

In this patch, we have introduced the sk_toa_data member in the sock
structure to hold the TOA information. There used to be an in-tree
module for TOA managing, whereas it has now been maintained as an
standalone module.

In this case, the toa module should register its hook function(s) using
the provided interfaces in the hookers module.

TOA in sock structure:

	__be32 sk_toa_data[16];

The hookers module only provides the sk_toa_data placeholder, and the
toa module can use this variable through the layout it needs.

Hook interfaces:

The hookers module replaces the kernel's syn_recv_sock and getname
handler with a stub that chains the toa module's hook function(s) to the
original handling function. The hookers module allows hook functions to
be installed and uninstalled in any order.

toa module:

The external toa module will be provided in separate RPM package.

[xuyu@linux.alibaba.com: amend commit log]
Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>

33576e37

ext4: fix NULL pointer dereference while journal is aborted · 0262b2d2

由 Jiufei Xue 提交于 3月 11, 2019

We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0

This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.

Fix this by checking the handle before updating the inode.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0262b2d2

perf probe: Fix getting the kernel map · 38fd2b68

由 Adrian Hunter 提交于 3月 11, 2019

Since commit 4d99e413 ("perf machine: Workaround missing maps for
x86 PTI entry trampolines"), perf tools has been creating more than one
kernel map, however 'perf probe' assumed there could be only one.

Fix by using machine__kernel_map() to get the main kernel map.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Xu Yu <xuyu@linux.alibaba.com>
Fixes: 4d99e413 ("perf machine: Workaround missing maps for x86 PTI entry trampolines")
Fixes: d83212d5 ("kallsyms, x86: Export addresses of PTI entry trampolines")
Link: http://lkml.kernel.org/r/2ed432de-e904-85d2-5c36-5897ddc5b23b@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

38fd2b68

14 3月, 2019 1 次提交

nfsd: fix wrong check in write_v4_end_grace() · 969e7b41

由 Yihao Wu 提交于 3月 06, 2019

commit dd838821f0a29781b185cd8fb8e48d5c177bd838 upstream.

Commit 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before
nfsd startup" is trying to fix a NULL dereference issue, but it
mistakenly checks if the nfsd server is started. So fix it.

Fixes: 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before nfsd startup"
Cc: stable@vger.kernel.org
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

969e7b41

12 3月, 2019 1 次提交

virtio_blk: add discard and write zeroes support · 77e6a733

由 Changpeng Liu 提交于 11月 01, 2018

commit 1f23816b8eb8fdc39990abe166c10a18c16f6b21 upstream.

In commit 88c85538, "virtio-blk: add discard and write zeroes features
to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
block specification has been extended to add VIRTIO_BLK_T_DISCARD and
VIRTIO_BLK_T_WRITE_ZEROES commands.  This patch enables support for
discard and write zeroes in the virtio-blk driver when the device
advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
VIRTIO_BLK_F_WRITE_ZEROES.
Signed-off-by: NChangpeng Liu <changpeng.liu@intel.com>
Signed-off-by: NDaniel Verkamp <dverkamp@chromium.org>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

77e6a733

23 2月, 2019 1 次提交

net: crypto set sk to NULL when af_alg_release. · f363be65

由 Mao Wenan 提交于 2月 18, 2019

commit 9060cb719e61b685ec0102574e10337fa5f445ea upstream.

KASAN has found use-after-free in sockfs_setattr.
The existed commit 6d8c50dc ("socket: close race condition between sock_close()
and sockfs_setattr()") is to fix this simillar issue, but it seems to ignore
that crypto module forgets to set the sk to NULL after af_alg_release.

KASAN report details as below:
BUG: KASAN: use-after-free in sockfs_setattr+0x120/0x150
Write of size 4 at addr ffff88837b956128 by task syz-executor0/4186

CPU: 2 PID: 4186 Comm: syz-executor0 Not tainted xxx + #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0xca/0x13e
 print_address_description+0x79/0x330
 ? vprintk_func+0x5e/0xf0
 kasan_report+0x18a/0x2e0
 ? sockfs_setattr+0x120/0x150
 sockfs_setattr+0x120/0x150
 ? sock_register+0x2d0/0x2d0
 notify_change+0x90c/0xd40
 ? chown_common+0x2ef/0x510
 chown_common+0x2ef/0x510
 ? chmod_common+0x3b0/0x3b0
 ? __lock_is_held+0xbc/0x160
 ? __sb_start_write+0x13d/0x2b0
 ? __mnt_want_write+0x19a/0x250
 do_fchownat+0x15c/0x190
 ? __ia32_sys_chmod+0x80/0x80
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 __x64_sys_fchownat+0xbf/0x160
 ? lockdep_hardirqs_on+0x39a/0x5e0
 do_syscall_64+0xc8/0x580
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462589
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89
ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3
48 c7 c1 bc ff ff
ff f7 d8 64 89 01 48
RSP: 002b:00007fb4b2c83c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
RAX: ffffffffffffffda RBX: 000000000072bfa0 RCX: 0000000000462589
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000007
RBP: 0000000000000005 R08: 0000000000001000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb4b2c846bc
R13: 00000000004bc733 R14: 00000000006f5138 R15: 00000000ffffffff

Allocated by task 4185:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x14a/0x350
 sk_prot_alloc+0xf6/0x290
 sk_alloc+0x3d/0xc00
 af_alg_accept+0x9e/0x670
 hash_accept+0x4a3/0x650
 __sys_accept4+0x306/0x5c0
 __x64_sys_accept4+0x98/0x100
 do_syscall_64+0xc8/0x580
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4184:
 __kasan_slab_free+0x12e/0x180
 kfree+0xeb/0x2f0
 __sk_destruct+0x4e6/0x6a0
 sk_destruct+0x48/0x70
 __sk_free+0xa9/0x270
 sk_free+0x2a/0x30
 af_alg_release+0x5c/0x70
 __sock_release+0xd3/0x280
 sock_close+0x1a/0x20
 __fput+0x27f/0x7f0
 task_work_run+0x136/0x1b0
 exit_to_usermode_loop+0x1a7/0x1d0
 do_syscall_64+0x461/0x580
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Syzkaller reproducer:
r0 = perf_event_open(&(0x7f0000000000)={0x0, 0x70, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext}, 0x0, 0x0,
0xffffffffffffffff, 0x0)
r1 = socket$alg(0x26, 0x5, 0x0)
getrusage(0x0, 0x0)
bind(r1, &(0x7f00000001c0)=@ALG={0x26, 'hash\x00', 0x0, 0x0,
'sha256-ssse3\x00'}, 0x80)
r2 = accept(r1, 0x0, 0x0)
r3 = accept4$unix(r2, 0x0, 0x0, 0x0)
r4 = dup3(r3, r0, 0x0)
fchownat(r4, &(0x7f00000000c0)='\x00', 0x0, 0x0, 0x1000)

Fixes: 6d8c50dc ("socket: close race condition between sock_close() and sockfs_setattr()")
Signed-off-by: NMao Wenan <maowenan@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f363be65

21 2月, 2019 15 次提交

vhost/vsock: fix vhost vsock cid hashing inconsistent · 0077fe01

由 Zha Bin 提交于 1月 08, 2019

commit 7fbe078c37aba3088359c9256c1a1d0c3e39ee81 upstream.

The vsock core only supports 32bit CID, but the Virtio-vsock spec define
CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
zero. This inconsistency causes one bug in vhost vsock driver. The
scenarios is:

  0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
  object. And hash_min() is used to compute the hash key. hash_min() is
  defined as:
  (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
  That means the hash algorithm has dependency on the size of macro
  argument 'val'.
  0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
  hash_min() to compute the hash key when inserting a vsock object into
  the hash table.
  0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
  to compute the hash key when looking up a vsock for an CID.

Because the different size of the CID, hash_min() returns different hash
key, thus fails to look up the vsock object for an CID.

To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
headers, but explicitly convert u64 to u32 when deal with the hash table
and vsock core.

Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers")
Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.texSigned-off-by: NZha Bin <zhabin@linux.alibaba.com>
Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0077fe01

kconfig: Disable x86 clocksource watchdog · 144cdeaf

由 Jiufei Xue 提交于 1月 14, 2019

Unstable tsc will trigger clocksource watchdog and disable itself, as a
result other clocksource will be elected as the current clocksource
which will result in performace issue on our servers.

RHEL7 also disabled this feature for some issues, see changelog:
[x86] disable clocksource watchdog (Prarit Bhargava) [914709]
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

144cdeaf

Revert "x86/tsc: Prepare warp test for TSC adjustment" · 3af4b358

由 Jiufei Xue 提交于 1月 10, 2019

This reverts commit 76d3b851.

The returned value for check_tsc_warp() is useless now, remove it.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3af4b358

Revert "x86/tsc: Try to adjust TSC if sync test fails" · 0be05fd5

由 Jiufei Xue 提交于 1月 10, 2019

This reverts commit cc4db268.

When we do hot-add and enable vCPU, the time inside the VM jumps and
then VM stucks.
The dmesg shows like this:
[   48.402948] CPU2 has been hot-added
[   48.413774] smpboot: Booting Node 0 Processor 2 APIC 0x2
[   48.415155] kvm-clock: cpu 2, msr 6b615081, secondary cpu clock
[   48.453690] TSC ADJUST compensate: CPU2 observed 139318776350 warp.  Adjust: 139318776350
[  102.060874] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[  102.060874] clocksource:                       'kvm-clock' wd_now: 1cb1cfc4bf8 wd_last: 1be9588f1fe mask: ffffffffffffffff
[  102.060874] clocksource:                       'tsc' cs_now: 207d794f7e cs_last: 205a32697a mask: ffffffffffffffff
[  102.060874] tsc: Marking TSC unstable due to clocksource watchdog
[  102.070188] KVM setup async PF for cpu 2
[  102.071461] kvm-stealtime: cpu 2, msr 13ba95000
[  102.074530] Will online and init hotplugged CPU: 2

This is because the TSC for the newly added VCPU is initialized to 0
while others are ahead. Guest will do the TSC ADJUST compensate and
cause the time jumps.

Commit bd8fab39("KVM: x86: fix maintaining of kvm_clock stability
on guest CPU hotplug") can fix this problem.  However, the host kernel
version may be older, so do not ajust TSC if sync test fails, just mark
it unstable.
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0be05fd5

block-throttle: enable hierarchical throttling even on traditional hierarchy · 78eef79e

由 Joseph Qi 提交于 12月 12, 2017

ECI may have an use case that configuring each device mapper disk
throttling policy just under root blkio cgroup, but actually using them
in different containers.
Since hierarchical throttling is now only supported on cgroup v2 and ECI
uses cgroup v1, so we have to enable hierarchical throttling on cgroup
v1.
This is ported from redhat 7u, and a year ago Jiufei already ported it
to alikernel 4.9 as well. So I think this change should be acceptable.
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

78eef79e

eci: drivers/virtio: add vring_force_dma_api boot param · a9828e81

由 Eryu Guan 提交于 12月 24, 2018

Prior to xdragon platform 20181230 release (e.g. 0930 release),
vring_use_dma_api() is required to return 'true' unconditionally.

Introduce a new kernel boot parameter called "vring_force_dma_api" to
control the behavior, boot xdragon host with "vring_force_dma_api"
command line to make ENI hotplug work, so that normal ECS hosts keep the
original behavior.
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>

a9828e81

boot: give rdrand some credit · 96f691ff

由 Arjan van de Ven 提交于 7月 29, 2016

Cherry-pick from clear-linux patches:
https://github.com/clearlinux-pkgs/linux-kvm/0104-give-rdrand-some-credit.patch

try to credit rdrand/rdseed with some entropy

In VMs but even modern hardware, we're super starved for entropy, and while we can
and do wear a tin foil hat, it's very hard to argue that
rdrand and rdtsc add zero entropy.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

96f691ff

NO-UPSTREAM: 9P: always use cached inode to fill in v9fs_vfs_getattr · 93038624

由 Julio Montes 提交于 9月 18, 2017

Cherry-pick from kata-container patches:
https://github.com/kata-containers/packaging/tree/master/kernel/patches/0001-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch

So that if in cache=none mode, we don't have to lookup server that
might not support open-unlink-fstat operation.

fixes https://github.com/01org/cc-oci-runtime/issues/47
fixes https://github.com/01org/cc-oci-runtime/issues/1062Signed-off-by: NJulio Montes <julio.montes@intel.com>
Signed-off-by: NPeng Tao <bergwolf@gmail.com>
Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

93038624

NEMU: Compile in evged always · 9a86e83e

由 Arjan van de Ven 提交于 8月 10, 2018

Cherry-pick from kata-container patches:
https://github.com/kata-containers/packaging/tree/master/kernel/patches/0002-Compile-in-evged-always.patch

We need evged for NEMU (and in general for hw reduced)

The config option cannot be set normally since it breaks all
regular systems, and hardware reduced is really a runtime choice.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9a86e83e

ext4: fix reserved cluster accounting at page invalidation time · 27d2b135

由 Eric Whitney 提交于 10月 01, 2018

commit f456767d3391e9f7d9d25a2e7241d75676dc19da upstream.

Add new code to count canceled pending cluster reservations on bigalloc
file systems and to reduce the cluster reservation count on all file
systems using delayed allocation.  This replaces old code in
ext4_da_page_release_reservations that was incorrect.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

27d2b135

ext4: adjust reserved cluster count when removing extents · b2116b09

由 Eric Whitney 提交于 10月 01, 2018

commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream.

Modify ext4_ext_remove_space() and the code it calls to correct the
reserved cluster count for pending reservations (delayed allocated
clusters shared with allocated blocks) when a block range is removed
from the extent tree. Pending reservations may be found for the clusters
at the ends of written or unwritten extents when a block range is removed.
If a physical cluster at the end of an extent is freed, it's necessary
to increment the reserved cluster count to maintain correct accounting
if the corresponding logical cluster is shared with at least one
delayed and unwritten extent as found in the extents status tree.

Add a new function, ext4_rereserve_cluster(), to reapply a reservation
on a delayed allocated cluster sharing blocks with a freed allocated
cluster. To avoid ENOSPC on reservation, a flag is applied to
ext4_free_blocks() to briefly defer updating the freeclusters counter
when an allocated cluster is freed. This prevents another thread
from allocating the freed block before the reservation can be reapplied.

Redefine the partial cluster object as a struct to carry more state
information and to clarify the code using it.

Adjust the conditional code structure in ext4_ext_remove_space to
reduce the indentation level in the main body of the code to improve
readability.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

b2116b09

ext4: reduce reserved cluster count by number of allocated clusters · 32b736a6

由 Eric Whitney 提交于 10月 01, 2018

commit b6bf9171ef5c37b66d446378ba63af5339a56a97 upstream.

Ext4 does not always reduce the reserved cluster count by the number
of clusters allocated when mapping a delayed extent. It sometimes
adds back one or more clusters after allocation if delalloc blocks
adjacent to the range allocated by ext4_ext_map_blocks() share the
clusters newly allocated for that range. However, this overcounts
the number of clusters needed to satisfy future mapping requests
(holding one or more reservations for clusters that have already been
allocated) and premature ENOSPC and quota failures, etc., result.

Ext4 also does not reduce the reserved cluster count when allocating
clusters for non-delayed allocated writes that have previously been
reserved for delayed writes. This also results in overcounts.

To make it possible to handle reserved cluster accounting for
fallocated regions in the same manner as used for other non-delayed
writes, do the reserved cluster accounting for them at the time of
allocation. In the current code, this is only done later when a
delayed extent sharing the fallocated region is finally mapped.

Address comment correcting handling of unsigned long long constant
from Jan Kara's review of RFC version of this patch.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

32b736a6

ext4: fix reserved cluster accounting at delayed write time · 99490366

由 Eric Whitney 提交于 10月 01, 2018

commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream.

The code in ext4_da_map_blocks sometimes reserves space for more
delayed allocated clusters than it should, resulting in premature
ENOSPC, exceeded quota, and inaccurate free space reporting.

Fix this by checking for written and unwritten blocks shared in the
same cluster with the newly delayed allocated block.  A cluster
reservation should not be made for a cluster for which physical space
has already been allocated.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

99490366

ext4: add new pending reservation mechanism · 1fc0a831

由 Eric Whitney 提交于 10月 01, 2018

commit 1dc0aa46e74a3366e12f426b7caaca477853e9c3 upstream.

Add new pending reservation mechanism to help manage reserved cluster
accounting.  Its primary function is to avoid the need to read extents
from the disk when invalidating pages as a result of a truncate, punch
hole, or collapse range operation.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

1fc0a831

ext4: generalize extents status tree search functions · 667f9459

由 Eric Whitney 提交于 10月 01, 2018

commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream.

Ext4 contains a few functions that are used to search for delayed
extents or blocks in the extents status tree.  Rather than duplicate
code to add new functions to search for extents with different status
values, such as written or a combination of delayed and unwritten,
generalize the existing code to search for caller-specified extents
status values.  Also, move this code into extents_status.c where it
is better associated with the data structures it operates upon, and
where it can be more readily used to implement new extents status tree
functions that might want a broader scope for i_es_lock.

Three missing static specifiers in RFC version of patch reported and
fixed by Fengguang Wu <fengguang.wu@intel.com>.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

667f9459

20 2月, 2019 15 次提交

G

Linux 4.19.24 · f287634f
由 Greg Kroah-Hartman 提交于 2月 20, 2019

f287634f

mm: proc: smaps_rollup: fix pss_locked calculation · dd5f4d06

由 Sandeep Patil 提交于 2月 12, 2019

commit 27dd768ed8db48beefc4d9e006c58e7a00342bde upstream.

The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
It accumulated the current pss everytime a locked VMA was found.  Fix
that by adding to 'pss_locked' the same time as that of 'pss' if the vma
being walked is locked.

Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com
Fixes: 493b0e9d ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: NSandeep Patil <sspatil@android.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org>	[4.14.x, 4.19.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

dd5f4d06

drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set · 6a204bd5

由 Joonas Lahtinen 提交于 2月 07, 2019

commit 2e7bd10e05afb866b5fb13eda25095c35d7a27cc upstream.

Make sure the underlying VMA in the process address space is the
same as it was during vm_mmap to avoid applying WC to wrong VMA.

A more long-term solution would be to have vm_mmap_locked variant
in linux/mmap.h for when caller wants to hold mmap_sem for an
extended duration.

v2:
- Refactor the compare function

Fixes: 1816f923 ("drm/i915: Support creation of unbound wc user mappings for objects")
Reported-by: NAdam Zabrocki <adamza@microsoft.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-1-joonas.lahtinen@linux.intel.com
(cherry picked from commit 5c4604e757ba9b193b09768d75a7d2105a5b883f)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6a204bd5

drm/i915: Block fbdev HPD processing during suspend · 4631e0b4

由 Lyude Paul 提交于 1月 29, 2019

commit e8a8fedd57fdcebf0e4f24ef0fc7e29323df8e66 upstream.

When resuming, we check whether or not any previously connected
MST topologies are still present and if so, attempt to resume them. If
this fails, we disable said MST topologies and fire off a hotplug event
so that userspace knows to reprobe.

However, sending a hotplug event involves calling
drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
connector reprobe in the caller's thread - something we can't do at the
point in which i915 calls drm_dp_mst_topology_mgr_resume() since
hotplugging hasn't been fully initialized yet.

This currently causes some rather subtle but fatal issues. For example,
on my T480s the laptop dock connected to it usually disappears during a
suspend cycle, and comes back up a short while after the system has been
resumed. This guarantees pretty much every suspend and resume cycle,
drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
a connector hotplug will occur. Now it's Rute Goldberg time: when the
connector hotplug occurs, i915 reprobes /all/ of the connectors,
including eDP. However, eDP probing requires that we power on the panel
VDD which in turn, grabs a wakeref to the appropriate power domain on
the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
things start breaking, since this all happens before
intel_power_domains_enable() is called we end up leaking the wakeref
that was acquired and never releasing it later. Come next suspend/resume
cycle, this causes us to fail to shut down the GPU properly, which
causes it not to resume properly and die a horrible complicated death.

(as a note: this only happens when there's both an eDP panel and MST
topology connected which is removed mid-suspend. One or the other seems
to always be OK).

We could try to fix the VDD wakeref leak, but this doesn't seem like
it's worth it at all since we aren't able to handle hotplug detection
while resuming anyway. So, let's go with a more robust solution inspired
by nouveau: block fbdev from handling hotplug events until we resume
fbdev. This allows us to still send sysfs hotplug events to be handled
later by user space while we're resuming, while also preventing us from
actually processing any hotplug events we receive until it's safe.

This fixes the wakeref leak observed on the T480s and as such, also
fixes suspend/resume with MST topologies connected on this machine.

Changes since v2:
* Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
  (Chris Wilson)
* Don't call drm_fb_helper_hotplug_event() in
  intel_fbdev_output_poll_changed() under lock (Chris Wilson)
* Always set ifbdev->hpd_waiting (Chris Wilson)
Signed-off-by: NLyude Paul <lyude@redhat.com>
Fixes: 0e32b39c ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v3.17+
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-2-lyude@redhat.com
(cherry picked from commit fe5ec65668cdaa4348631d8ce1766eed43b33c10)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4631e0b4

drm/vkms: Fix license inconsistent · de48b5f3

由 Rodrigo Siqueira 提交于 2月 06, 2019

commit 7fd56e0260a22c0cfaf9adb94a2427b76e239dd0 upstream.

Fixes license inconsistent related to the VKMS driver and remove the
redundant boilerplate comment.

Fixes: 854502fa ("drm/vkms: Add basic CRTC initialization")

Cc: stable@vger.kernel.org
Signed-off-by: NRodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190206140116.7qvy2lpwbcd7wds6@smtp.gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

de48b5f3

drm: Use array_size() when creating lease · 3312e0ae

由 Matthew Wilcox 提交于 2月 14, 2019

commit 69ef943dbc14b21987c79f8399ffea08f9a1b446 upstream.

Passing an object_count of sufficient size will make
object_count * 4 wrap around to be very small, then a later function
will happily iterate off the end of the object_ids array.  Using
array_size() will saturate at SIZE_MAX, the kmalloc() will fail and
we'll return an -ENOMEM to the norty userspace.

Fixes: 62884cd3 ("drm: Add four ioctls for managing drm mode object leases [v7]")
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: NDave Airlie <airlied@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3312e0ae

dm thin: fix bug where bio that overwrites thin block ignores FUA · 029a38f8

由 Nikos Tsironis 提交于 2月 14, 2019

commit 4ae280b4ee3463fa57bbe6eede26b97daff8a0f1 upstream.

When provisioning a new data block for a virtual block, either because
the block was previously unallocated or because we are breaking sharing,
if the whole block of data is being overwritten the bio that triggered
the provisioning is issued immediately, skipping copying or zeroing of
the data block.

When this bio completes the new mapping is inserted in to the pool's
metadata by process_prepared_mapping(), where the bio completion is
signaled to the upper layers.

This completion is signaled without first committing the metadata.  If
the bio in question has the REQ_FUA flag set and the system crashes
right after its completion and before the next metadata commit, then the
write is lost despite the REQ_FUA flag requiring that I/O completion for
this request must only be signaled after the data has been committed to
non-volatile storage.

Fix this by deferring the completion of overwrite bios, with the REQ_FUA
flag set, until after the metadata has been committed.

Cc: stable@vger.kernel.org
Signed-off-by: NNikos Tsironis <ntsironis@arrikto.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Acked-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

029a38f8

dm crypt: don't overallocate the integrity tag space · 3ec93eb3

由 Mikulas Patocka 提交于 2月 08, 2019

commit ff0c129d3b5ecb3df7c8f5e2236582bf745b6c5f upstream.

bio_sectors() returns the value in the units of 512-byte sectors (no
matter what the real sector size of the device).  dm-crypt multiplies
bio_sectors() by on_disk_tag_size to calculate the space allocated for
integrity tags.  If dm-crypt is running with sector size larger than
512b, it allocates more data than is needed.

Device Mapper trims the extra space when passing the bio to
dm-integrity, so this bug didn't result in any visible misbehavior.
But it must be fixed to avoid wasteful memory allocation for the block
integrity payload.

Fixes: ef43aa38 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3ec93eb3

x86/a.out: Clear the dump structure initially · b2778ef8

由 Borislav Petkov 提交于 2月 12, 2019

commit 10970e1b4be9c74fce8ab6e3c34a7d718f063f2c upstream.

dump_thread32() in aout_core_dump() does not clear the user32 structure
allocated on the stack as the first thing on function entry.

As a result, the dump.u_comm, dump.u_ar0 and dump.signal which get
assigned before the clearing, get overwritten.

Rename that function to fill_dump() to make it clear what it does and
call it first thing.

This was caught while staring at a patch by Derek Robson
<robsonde@gmail.com>.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Derek Robson <robsonde@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Matz <matz@suse.de>
Cc: x86@kernel.org
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190202005512.3144-1-robsonde@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2778ef8

md/raid1: don't clear bitmap bits on interrupted recovery. · ddf96641

由 Nate Dailey 提交于 2月 07, 2019

commit dfcc34c99f3ebc16b787b118763bf9cb6b1efc7a upstream.

sync_request_write no longer submits writes to a Faulty device. This has
the unfortunate side effect that bitmap bits can be incorrectly cleared
if a recovery is interrupted (previously, end_sync_write would have
prevented this). This means the next recovery may not copy everything
it should, potentially corrupting data.

Add a function for doing the proper md_bitmap_end_sync, called from
end_sync_write and the Faulty case in sync_request_write.

backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync
Cc: stable@vger.kernel.org 4.14+
Fixes: 0c9d5b12 ("md/raid1: avoid reusing a resync bio after error handling.")
Reviewed-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Tested-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: NNate Dailey <nate.dailey@stratus.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ddf96641

signal: Restore the stop PTRACE_EVENT_EXIT · a2b3e2c0

由 Eric W. Biederman 提交于 2月 11, 2019

commit cf43a757fd49442bc38f76088b70c2299eed2c2f upstream.

In the middle of do_exit() there is there is a call
"ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
for the debugger to release the task or SIGKILL to be delivered.

Skipping past dequeue_signal when we know a fatal signal has already
been delivered resulted in SIGKILL remaining pending and
TIF_SIGPENDING remaining set.  This in turn caused the
scheduler to not sleep in PTACE_EVENT_EXIT as it figured
a fatal signal was pending.  This also caused ptrace_freeze_traced
in ptrace_check_attach to fail because it left a per thread
SIGKILL pending which is what fatal_signal_pending tests for.

This difference in signal state caused strace to report
strace: Exit of unknown pid NNNNN ignored

Therefore update the signal handling state like dequeue_signal
would when removing a per thread SIGKILL, by removing SIGKILL
from the per thread signal mask and clearing TIF_SIGPENDING.
Acked-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NIvan Delalande <colona@arista.com>
Cc: stable@vger.kernel.org
Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a2b3e2c0

scsi: sd: fix entropy gathering for most rotational disks · 0396cf55

由 James Bottomley 提交于 2月 12, 2019

commit e4a056987c86f402f1286e050b1dee3f4ce7c7eb upstream.

The problem is that the default for MQ is not to gather entropy, whereas
the default for the legacy queue was always to gather it. The original
attempt to fix entropy gathering for rotational disks under MQ added an
else branch in sd_read_block_characteristics(). Unfortunately, the entire
check isn't reached if the device has no characteristics VPD page. Since
this page was only introduced in SBC-3 and its optional anyway, most less
expensive rotational disks don't have one, meaning they all stopped
gathering entropy when we made MQ the default. In a wholly unrelated
change, openssl and openssh won't function until the random number
generator is initialised, meaning lots of people have been seeing large
delays before they could log into systems with default MQ kernels due to
this lack of entropy, because it now can take tens of minutes to initialise
the kernel random number generator.

The fix is to set the non-rotational and add-randomness flags
unconditionally early on in the disk initialization path, so they can be
reset only if the device actually reports being non-rotational via the VPD
page.
Reported-by: NMikael Pettersson <mikpelinux@gmail.com>
Fixes: 83e32a59 ("scsi: sd: Contribute to randomness when running rotational device")
Cc: stable@vger.kernel.org
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NXuewei Zhang <xueweiz@google.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0396cf55

x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls · cdc35685

由 Hedi Berriche 提交于 2月 13, 2019

commit f331e766c4be33f4338574f3c9f7f77e98ab4571 upstream.

Calls into UV firmware must be protected against concurrency, expose the
efi_runtime_lock to the UV platform, and use it to serialise UV BIOS
calls.
Signed-off-by: NHedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NRuss Anderson <rja@hpe.com>
Reviewed-by: NDimitri Sivanich <sivanich@hpe.com>
Reviewed-by: NMike Travis <mike.travis@hpe.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: stable@vger.kernel.org # v4.9+
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190213193413.25560-5-hedi.berriche@hpe.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cdc35685

tracing/uprobes: Fix output for multiple string arguments · 45649b99

由 Andreas Ziegler 提交于 1月 16, 2019

commit 0722069a5374b904ec1a67f91249f90e1cfae259 upstream.

When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.

This is best explained in an example:

Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):

$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
    /sys/kernel/debug/tracing/uprobe_events

When the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:

  [...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"

Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.

The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.

Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.

Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.

Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 5baaa59e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NAndreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

45649b99

s390/zcrypt: fix specification exception on z196 during ap probe · 88e1e66a

由 Harald Freudenberger 提交于 1月 23, 2019

commit 8f9aca0c45322a807a343fc32f95f2500f83b9ae upstream.

The older machines don't have the QCI instruction available.
With support for up to 256 crypto cards the probing of each
card has been extended to check card ids from 0 up to 255.
For machines with QCI support there is a filter limiting the
range of probed cards. The older machines (z196 and older)
don't have this filter and so since support for 256 cards is
in the driver all cards are probed. However, these machines
also require to have the card id fit into 6 bits. Exceeding
this limit results in a specification exception which happens
on every kernel startup even when there is no crypto configured
and used at all.

This fix limits the range of probed crypto cards to 64 if
there is no QCI instruction available to obey to the older
ap architecture and so fixes the specification exceptions
on z196 machines.

Cc: stable@vger.kernel.org # v4.17+
Fixes: af4a7227 ("s390/zcrypt: Support up to 256 crypto adapters.")
Signed-off-by: NHarald Freudenberger <freude@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

88e1e66a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功