提交 · 4b39051363a0a374afd073e2fcaaa8602b1befa2 · openanolis / cloud-kernel

27 3月, 2019 31 次提交

Bluetooth: Fix decrementing reference count twice in releasing socket · 4b390513

由 Myungho Jung 提交于 2月 02, 2019

commit e20a2e9c42c9e4002d9e338d74e7819e88d77162 upstream.

When releasing socket, it is possible to enter hci_sock_release() and
hci_sock_dev_event(HCI_DEV_UNREG) at the same time in different thread.
The reference count of hdev should be decremented only once from one of
them but if storing hdev to local variable in hci_sock_release() before
detached from socket and setting to NULL in hci_sock_dev_event(),
hci_dev_put(hdev) is unexpectedly called twice. This is resolved by
referencing hdev from socket after bt_sock_unlink() in
hci_sock_release().

Reported-by: syzbot+fdc00003f4efff43bc5b@syzkaller.appspotmail.com
Signed-off-by: NMyungho Jung <mhjungk@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4b390513

Bluetooth: hci_uart: Check if socket buffer is ERR_PTR in h4_recv_buf() · 4e0ca4bf

由 Myungho Jung 提交于 1月 22, 2019

commit 1dc2d785156cbdc80806c32e8d2c7c735d0b4721 upstream.

h4_recv_buf() callers store the return value to socket buffer and
recursively pass the buffer to h4_recv_buf() without protection. So,
ERR_PTR returned from h4_recv_buf() can be dereferenced, if called again
before setting the socket buffer to NULL from previous error. Check if
skb is ERR_PTR in h4_recv_buf().

Reported-by: syzbot+017a32f149406df32703@syzkaller.appspotmail.com
Signed-off-by: NMyungho Jung <mhjungk@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4e0ca4bf

media: v4l2-ctrls.c/uvc: zero v4l2_event · 6bef442e

由 Hans Verkuil 提交于 12月 18, 2018

commit f45f3f753b0a3d739acda8e311b4f744d82dc52a upstream.

Control events can leak kernel memory since they do not fully zero the
event. The same code is present in both v4l2-ctrls.c and uvc_ctrl.c, so
fix both.

It appears that all other event code is properly zeroing the structure,
it's these two places.
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Reported-by: syzbot+4f021cf3697781dbd9fb@syzkaller.appspotmail.com
Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6bef442e

ext4: brelse all indirect buffer in ext4_ind_remove_space() · d12d8641

由 zhangyi (F) 提交于 3月 23, 2019

commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217 upstream.

All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.

 - Create and mount an empty ext4 filesystem without extent and quota
   features,
 - quotacheck and enable the user & group quota,
 - Create some files and write some data to them, and then punch hole
   to some files of them, it may trigger the buffer leak problem
   mentioned above.
 - Disable quota and run quotacheck again, it will create two new
   aquota files and write the checked quota information to them, which
   probably may reuse the freed indirect block(the buffer and page
   cache was not freed) as data block.
 - Enable quota again, it will invoke
   vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
   buffers and pagecache. Unfortunately, because of the buffer of quota
   data block is still referenced, quota code cannot read the up to date
   quota info from the device and lead to quota information corruption.

This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.

This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d12d8641

ext4: fix data corruption caused by unaligned direct AIO · 76c9ee6b

由 Lukas Czerner 提交于 3月 14, 2019

commit 372a03e01853f860560eade508794dd274e9b390 upstream.

Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.

However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.

This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31

With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
*
0000b200
Reported-by: NFrank Sorenson <fsorenso@redhat.com>
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcec ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

76c9ee6b

ext4: fix NULL pointer dereference while journal is aborted · 558331d0

由 Jiufei Xue 提交于 3月 14, 2019

commit fa30dde38aa8628c73a6dded7cb0bba38c27b576 upstream.

We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0

This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.

Fix this by checking the handle before updating the inode.

Fixes: b436b9be ("ext4: Wait for proper transaction commit on fsync")
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

558331d0

ALSA: ac97: Fix of-node refcount unbalance · ff7a1f81

由 Takashi Iwai 提交于 2月 18, 2019

commit 31d2350d602511efc9ef626b848fe521233b0387 upstream.

ac97_of_get_child_device() take the refcount of the node explicitly
via of_node_get(), but this leads to an unbalance.  The
for_each_child_of_node() loop itself takes the refcount for each
iteration node, hence you don't need to take the extra refcount
again.

Fixes: 2225a3e6 ("ALSA: ac97: add codecs devicetree binding")
Reviewed-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ff7a1f81

ALSA: hda/ca0132 - make pci_iounmap() call conditional · c8e91d75

由 Arnd Bergmann 提交于 12月 10, 2018

commit 1e73359a24fad529b0794515b46cbfff99e5fbe6 upstream.

When building without CONFIG_PCI, we can (depending on the architecture)
get a link failure:

ERROR: "pci_iounmap" [sound/pci/hda/snd-hda-codec-ca0132.ko] undefined!

Adding a compile-time check for PCI gets it to work correctly on
32-bit ARM.

Fixes: d99501b8575d ("ALSA: hda/ca0132 - Call pci_iounmap() instead of iounmap()")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c8e91d75

ALSA: x86: Fix runtime PM for hdmi-lpe-audio · 4a767459

由 Ville Syrjälä 提交于 10月 24, 2018

commit 8dfb839cfe737a17def8e5f88ee13c295230364a upstream.

Commit 46e831abe864 ("drm/i915/lpe: Mark LPE audio runtime pm as
"no callbacks"") broke runtime PM with lpe audio. We can no longer
runtime suspend the GPU since the sysfs power/control for the
lpe-audio device no longer exists and the device is considered
always active. We can fix this by not marking the device as
active.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Fixes: 46e831abe864 ("drm/i915/lpe: Mark LPE audio runtime pm as "no callbacks"")
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181024154825.18185-1-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Acked-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4a767459

SMB3: Fix SMB3.1.1 guest mounts to Samba · 38bd575b

由 Steve French 提交于 3月 22, 2019

commit 8c11a607d1d9cd6e7f01fd6b03923597fb0ef95a upstream.

Workaround problem with Samba responses to SMB3.1.1
null user (guest) mounts.  The server doesn't set the
expected flag in the session setup response so we have
to do a similar check to what is done in smb3_validate_negotiate
where we also check if the user is a null user (but not sec=krb5
since username might not be passed in on mount for Kerberos case).

Note that the commit below tightened the conditions and forced signing
for the SMB2-TreeConnect commands as per MS-SMB2.
However, this should only apply to normal user sessions and not for
cases where there is no user (even if server forgets to set the flag
in the response) since we don't have anything useful to sign with.
This is especially important now that the more secure SMB3.1.1 protocol
is in the default dialect list.

An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed
the guest mounts to Windows.

    Fixes: 6188f28b ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NPaulo Alcantara <palcantara@suse.de>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

38bd575b

irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp · aacf2cc8

由 Rasmus Villemoes 提交于 3月 12, 2019

commit 89dc891792c2e046b030f87600109c22209da32e upstream.

The lpi_range_list is supposed to be sorted in ascending order of
->base_id (at least if the range merging is to work), but the current
comparison function returns a positive value if rb->base_id >
ra->base_id, which means that list_sort() will put A after B in that
case - and vice versa, of course.

Fixes: 880cb3cd (irqchip/gic-v3-its: Refactor LPI allocator)
Cc: stable@vger.kernel.org (v4.19+)
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

aacf2cc8

objtool: Move objtool_file struct off the stack · daaeeca9

由 Josh Poimboeuf 提交于 3月 18, 2019

commit 0c671812f152b628bd87c0af49da032cc2a2c319 upstream.

Objtool uses over 512k of stack, thanks to the hash table embedded in
the objtool_file struct.  This causes an unnecessarily large stack
allocation and breaks users with low stack limits.

Move the struct off the stack.

Fixes: 042ba73f ("objtool: Add several performance improvements")
Reported-by: NVassili Karpov <moosotc@gmail.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/df92dcbc4b84b02ffa252f46876df125fb56e2d7.1552954176.git.jpoimboe@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

daaeeca9

perf probe: Fix getting the kernel map · 37c6f808

由 Adrian Hunter 提交于 3月 04, 2019

commit eaeffeb9838a7c0dec981d258666bfcc0fa6a947 upstream.

Since commit 4d99e413 ("perf machine: Workaround missing maps for
x86 PTI entry trampolines"), perf tools has been creating more than one
kernel map, however 'perf probe' assumed there could be only one.

Fix by using machine__kernel_map() to get the main kernel map.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Tested-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Xu Yu <xuyu@linux.alibaba.com>
Fixes: 4d99e413 ("perf machine: Workaround missing maps for x86 PTI entry trampolines")
Fixes: d83212d5 ("kallsyms, x86: Export addresses of PTI entry trampolines")
Link: http://lkml.kernel.org/r/2ed432de-e904-85d2-5c36-5897ddc5b23b@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

37c6f808

cifs: allow guest mounts to work for smb3.11 · 14c52aca

由 Ronnie Sahlberg 提交于 3月 21, 2019

commit e71ab2aa06f731a944993120b0eef1556c63b81c upstream.

Fix Guest/Anonymous sessions so that they work with SMB 3.11.

The commit noted below tightened the conditions and forced signing for
the SMB2-TreeConnect commands as per MS-SMB2.
However, this should only apply to normal user sessions and not for
Guest/Anonumous sessions.

Fixes: 6188f28b ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

14c52aca

futex: Ensure that futex address is aligned in handle_futex_death() · 36d52f5b

由 Chen Jie 提交于 3月 15, 2019

commit 5a07168d8d89b00fe1760120714378175b3ef992 upstream.

The futex code requires that the user space addresses of futexes are 32bit
aligned. sys_futex() checks this in futex_get_keys() but the robust list
code has no alignment check in place.

As a consequence the kernel crashes on architectures with strict alignment
requirements in handle_futex_death() when trying to cmpxchg() on an
unaligned futex address which was retrieved from the robust list.

[ tglx: Rewrote changelog, proper sizeof() based alignement check and add
  	comment ]

Fixes: 0771dfef ("[PATCH] lightweight robust futexes: core")
Signed-off-by: NChen Jie <chenjie6@huawei.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: <dvhart@infradead.org>
Cc: <peterz@infradead.org>
Cc: <zengweilin@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

36d52f5b

scsi: ibmvscsi: Fix empty event pool access during host removal · 837becb3

由 Tyrel Datwyler 提交于 3月 20, 2019

commit 7f5203c13ba8a7b7f9f6ecfe5a4d5567188d7835 upstream.

The event pool used for queueing commands is destroyed fairly early in the
ibmvscsi_remove() code path. Since, this happens prior to the call so
scsi_remove_host() it is possible for further calls to queuecommand to be
processed which manifest as a panic due to a NULL pointer dereference as
seen here:

PANIC: "Unable to handle kernel paging request for data at address
0x00000000"

Context process backtrace:

DSISR: 0000000042000000 ????Syscall Result: 0000000000000000
4 [c000000002cb3820] memcpy_power7 at c000000000064204
[Link Register] [c000000002cb3820] ibmvscsi_send_srp_event at d000000003ed14a4
5 [c000000002cb3920] ibmvscsi_send_srp_event at d000000003ed14a4 [ibmvscsi] ?(unreliable)
6 [c000000002cb39c0] ibmvscsi_queuecommand at d000000003ed2388 [ibmvscsi]
7 [c000000002cb3a70] scsi_dispatch_cmd at d00000000395c2d8 [scsi_mod]
8 [c000000002cb3af0] scsi_request_fn at d00000000395ef88 [scsi_mod]
9 [c000000002cb3be0] __blk_run_queue at c000000000429860
10 [c000000002cb3c10] blk_delay_work at c00000000042a0ec
11 [c000000002cb3c40] process_one_work at c0000000000dac30
12 [c000000002cb3cd0] worker_thread at c0000000000db110
13 [c000000002cb3d80] kthread at c0000000000e3378
14 [c000000002cb3e30] ret_from_kernel_thread at c00000000000982c

The kernel buffer log is overfilled with this log:

[11261.952732] ibmvscsi: found no event struct in pool!

This patch reorders the operations during host teardown. Start by calling
the SRP transport and Scsi_Host remove functions to flush any outstanding
work and set the host offline. LLDD teardown follows including destruction
of the event pool, freeing the Command Response Queue (CRQ), and unmapping
any persistent buffers. The event pool destruction is protected by the
scsi_host lock, and the pool is purged prior of any requests for which we
never received a response. Finally, move the removal of the scsi host from
our global list to the end so that the host is easily locatable for
debugging purposes during teardown.

Cc: <stable@vger.kernel.org> # v2.6.12+
Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

837becb3

scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton · 04809b22

由 Tyrel Datwyler 提交于 3月 20, 2019

commit 7205981e045e752ccf96cf6ddd703a98c59d4339 upstream.

For each ibmvscsi host created during a probe or destroyed during a remove
we either add or remove that host to/from the global ibmvscsi_head
list. This runs the risk of concurrent modification.

This patch adds a simple spinlock around the list modification calls to
prevent concurrent updates as is done similarly in the ibmvfc driver and
ipr driver.

Fixes: 32d6e4b6 ("scsi: ibmvscsi: add vscsi hosts to global list_head")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

04809b22

powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 · b8ea151a

由 Michael Ellerman 提交于 3月 14, 2019

commit b5b4453e7912f056da1ca7572574cada32ecb60c upstream.

Jakub Drnec reported:
  Setting the realtime clock can sometimes make the monotonic clock go
  back by over a hundred years. Decreasing the realtime clock across
  the y2k38 threshold is one reliable way to reproduce. Allegedly this
  can also happen just by running ntpd, I have not managed to
  reproduce that other than booting with rtc at >2038 and then running
  ntp. When this happens, anything with timers (e.g. openjdk) breaks
  rather badly.

And included a test case (slightly edited for brevity):
  #define _POSIX_C_SOURCE 199309L
  #include <stdio.h>
  #include <time.h>
  #include <stdlib.h>
  #include <unistd.h>

  long get_time(void) {
    struct timespec tp;
    clock_gettime(CLOCK_MONOTONIC, &tp);
    return tp.tv_sec + tp.tv_nsec / 1000000000;
  }

  int main(void) {
    long last = get_time();
    while(1) {
      long now = get_time();
      if (now < last) {
        printf("clock went backwards by %ld seconds!\n", last - now);
      }
      last = now;
      sleep(1);
    }
    return 0;
  }

Which when run concurrently with:
 # date -s 2040-1-1
 # date -s 2037-1-1

Will detect the clock going backward.

The root cause is that wtom_clock_sec in struct vdso_data is only a
32-bit signed value, even though we set its value to be equal to
tk->wall_to_monotonic.tv_sec which is 64-bits.

Because the monotonic clock starts at zero when the system boots the
wall_to_montonic.tv_sec offset is negative for current and future
dates. Currently on a freshly booted system the offset will be in the
vicinity of negative 1.5 billion seconds.

However if the wall clock is set past the Y2038 boundary, the offset
from wall to monotonic becomes less than negative 2^31, and no longer
fits in 32-bits. When that value is assigned to wtom_clock_sec it is
truncated and becomes positive, causing the VDSO assembly code to
calculate CLOCK_MONOTONIC incorrectly.

That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
it is not meant to do. Worse, if the time is then set back before the
Y2038 boundary CLOCK_MONOTONIC will jump backward.

We can fix it simply by storing the full 64-bit offset in the
vdso_data, and using that in the VDSO assembly code. We also shuffle
some of the fields in vdso_data to avoid creating a hole.

The original commit that added the CLOCK_MONOTONIC support to the VDSO
did actually use a 64-bit value for wtom_clock_sec, see commit
a7f290da ("[PATCH] powerpc: Merge vdso's and add vdso support to
32 bits kernel") (Nov 2005). However just 3 days later it was
converted to 32-bits in commit 0c37ec2a ("[PATCH] powerpc: vdso
fixes (take #2)"), and the bug has existed since then AFAICS.

Fixes: 0c37ec2a ("[PATCH] powerpc: vdso fixes (take #2)")
Cc: stable@vger.kernel.org # v2.6.15+
Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.czReported-by: NJakub Drnec <jaydee@email.cz>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b8ea151a

MIPS: Fix kernel crash for R6 in jump label branch function · 9d91069f

由 Archer Yan 提交于 3月 08, 2019

commit 47c25036b60f27b86ab44b66a8861bcf81cde39b upstream.

Insert Branch instruction instead of NOP to make sure assembler don't
patch code in forbidden slot. In jump label function, it might
be possible to patch Control Transfer Instructions(CTIs) into
forbidden slot, which will generate Reserved Instruction exception
in MIPS release 6.
Signed-off-by: NArcher Yan <ayan@wavecomp.com>
Reviewed-by: NPaul Burton <paul.burton@mips.com>
[paul.burton@mips.com:
  - Add MIPS prefix to subject.
  - Mark for stable from v4.0, which introduced r6 support, onwards.]
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9d91069f

MIPS: Ensure ELF appended dtb is relocated · 6e74961b

由 Yasha Cherikovsky 提交于 3月 08, 2019

commit 3f0a53bc6482fb09770982a8447981260ea258dc upstream.

This fixes booting with the combination of CONFIG_RELOCATABLE=y
and CONFIG_MIPS_ELF_APPENDED_DTB=y.

Sections that appear after the relocation table are not relocated
on system boot (except .bss, which has special handling).

With CONFIG_MIPS_ELF_APPENDED_DTB, the dtb is part of the
vmlinux ELF, so it must be relocated together with everything else.

Fixes: 069fd766 ("MIPS: Reserve space for relocation table")
Signed-off-by: NYasha Cherikovsky <yasha.che3@gmail.com>
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6e74961b

mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction. · 56bcf3df

由 Yifeng Li 提交于 3月 05, 2019

commit 5f5f67da9781770df0403269bc57d7aae608fecd upstream.

Timekeeping IRQs from CS5536 MFGPT are routed to i8259, which then
triggers the "cascade" IRQ on MIPS CPU. Without IRQF_NO_SUSPEND in
cascade_irqaction, MFGPT interrupts will be masked in suspend mode,
and the machine would be unable to resume once suspended.

Previously, MIPS IRQs were not disabled properly, so the original
code appeared to work. Commit a3e6c1ef ("MIPS: IRQ: Fix disable_irq on
CPU IRQs") uncovers the bug. To fix it, add IRQF_NO_SUSPEND to
cascade_irqaction.

This commit is functionally identical to 0add9c2f ("MIPS:
Loongson-3: Add IRQF_NO_SUSPEND to Cascade irqaction"), but it forgot
to apply the same fix to Loongson2.
Signed-off-by: NYifeng Li <tomli@tomli.me>
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

56bcf3df

udf: Fix crash on IO error during truncate · c72e90d9

由 Jan Kara 提交于 3月 11, 2019

commit d3ca4651d05c0ff7259d087d8c949bcf3e14fb46 upstream.

When truncate(2) hits IO error when reading indirect extent block the
code just bugs with:

kernel BUG at linux-4.15.0/fs/udf/truncate.c:249!
...

Fix the problem by bailing out cleanly in case of IO error.

CC: stable@vger.kernel.org
Reported-by: Njean-luc malet <jeanluc.malet@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c72e90d9

libceph: wait for latest osdmap in ceph_monc_blacklist_add() · 9cae232a

由 Ilya Dryomov 提交于 3月 20, 2019

commit bb229bbb3bf63d23128e851a1f3b85c083178fa1 upstream.

Because map updates are distributed lazily, an OSD may not know about
the new blacklist for quite some time after "osd blacklist add" command
is completed.  This makes it possible for a blacklisted but still alive
client to overwrite a post-blacklist update, resulting in data
corruption.

Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
the post-blacklist epoch for all post-blacklist requests ensures that
all such requests "wait" for the blacklist to come into force on their
respective OSDs.

Cc: stable@vger.kernel.org
Fixes: 6305a3b4 ("libceph: support for blacklisting clients")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9cae232a

iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE · 86915713

由 Stanislaw Gruszka 提交于 3月 13, 2019

commit 4e50ce03976fbc8ae995a000c4b10c737467beaa upstream.

Take into account that sg->offset can be bigger than PAGE_SIZE when
setting segment sg->dma_address. Otherwise sg->dma_address will point
at diffrent page, what makes DMA not possible with erros like this:

xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]

Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
what what can cause crashes like this:

Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon pfn:39e8b1
Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E)
Feb 28 19:27:45 kernel: scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G B W E 4.20.12-arch1-1-custom #1
Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
Feb 28 19:27:45 kernel: Call Trace:
Feb 28 19:27:45 kernel: dump_stack+0x5c/0x80
Feb 28 19:27:45 kernel: bad_page.cold.29+0x7f/0xb2
Feb 28 19:27:45 kernel: __free_pages_ok+0x2c0/0x2d0
Feb 28 19:27:45 kernel: skb_release_data+0x96/0x180
Feb 28 19:27:45 kernel: __kfree_skb+0xe/0x20
Feb 28 19:27:45 kernel: tcp_recvmsg+0x894/0xc60
Feb 28 19:27:45 kernel: ? reuse_swap_page+0x120/0x340
Feb 28 19:27:45 kernel: ? ptep_set_access_flags+0x23/0x30
Feb 28 19:27:45 kernel: inet_recvmsg+0x5b/0x100
Feb 28 19:27:45 kernel: __sys_recvfrom+0xc3/0x180
Feb 28 19:27:45 kernel: ? handle_mm_fault+0x10a/0x250
Feb 28 19:27:45 kernel: ? syscall_trace_enter+0x1d3/0x2d0
Feb 28 19:27:45 kernel: ? __audit_syscall_exit+0x22a/0x290
Feb 28 19:27:45 kernel: __x64_sys_recvfrom+0x24/0x30
Feb 28 19:27:45 kernel: do_syscall_64+0x5b/0x170
Feb 28 19:27:45 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: stable@vger.kernel.org
Reported-and-tested-by: NJan Viktorin <jan.viktorin@gmail.com>
Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Fixes: 80187fd3 ('iommu/amd: Optimize map_sg and unmap_sg')
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

86915713

drm/vmwgfx: Return 0 when gmrid::get_node runs out of ID's · 69e26237

由 Deepak Rawat 提交于 2月 28, 2019

commit 4b9ce3a651a37c60527101db4451a315a8b9588f upstream.

If it's not a system error and get_node implementation accommodate the
buffer object then it should return 0 with memm::mm_node set to NULL.

v2: Test for id != -ENOMEM instead of id == -ENOSPC.

Cc: <stable@vger.kernel.org>
Fixes: 4eb085e4 ("drm/vmwgfx: Convert to new IDA API")
Signed-off-by: NDeepak Rawat <drawat@vmware.com>
Reviewed-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

69e26237

drm/vmwgfx: Don't double-free the mode stored in par->set_mode · ab483d1c

由 Thomas Zimmermann 提交于 3月 18, 2019

commit c2d311553855395764e2e5bf401d987ba65c2056 upstream.

When calling vmw_fb_set_par(), the mode stored in par->set_mode gets free'd
twice. The first free is in vmw_fb_kms_detach(), the second is near the
end of vmw_fb_set_par() under the name of 'old_mode'. The mode-setting code
only works correctly if the mode doesn't actually change. Removing
'old_mode' in favor of using par->set_mode directly fixes the problem.

Cc: <stable@vger.kernel.org>
Fixes: a278724a ("drm/vmwgfx: Implement fbdev on kms v2")
Signed-off-by: NThomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: NDeepak Rawat <drawat@vmware.com>
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ab483d1c

mmc: renesas_sdhi: limit block count to 16 bit for old revisions · 42f358b2

由 Wolfram Sang 提交于 3月 19, 2019

commit c9a9497ccef205ed4ed2e247011382627876d831 upstream.

R-Car Gen2 has two different SDHI incarnations in the same chip. The
older one does not support the recently introduced 32 bit register
access to the block count register. Make sure we use this feature only
after the first known version.

Thanks to the Renesas Testing team for this bug report!

Fixes: 5603731a15ef ("mmc: tmio: fix access width of Block Count Register")
Reported-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: NSimon Horman <horms+renesas@verge.net.au>
Tested-by: NPhong Hoang <phong.hoang.wz@renesas.com>
Cc: stable@vger.kernel.org
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

42f358b2

mmc: mxcmmc: "Revert mmc: mxcmmc: handle highmem pages" · 65a5c936

由 Alexander Shiyan 提交于 3月 17, 2019

commit 2b77158ffa92b820a0c5da9a3c6ead7aa069c71c upstream.

This reverts commit b189e758.

Unable to handle kernel paging request at virtual address c8358000
pgd = efa405c3
[c8358000] *pgd=00000000
Internal error: Oops: 805 [#1] PREEMPT ARM
CPU: 0 PID: 711 Comm: kworker/0:2 Not tainted 4.20.0+ #30
Hardware name: Freescale i.MX27 (Device Tree Support)
Workqueue: events mxcmci_datawork
PC is at mxcmci_datawork+0xbc/0x2ac
LR is at mxcmci_datawork+0xac/0x2ac
pc : [<c04e33c8>]    lr : [<c04e33b8>]    psr: 60000013
sp : c6c93f08  ip : 24004180  fp : 00000008
r10: c8358000  r9 : c78b3e24  r8 : c6c92000
r7 : 00000000  r6 : c7bb8680  r5 : c7bb86d4  r4 : c78b3de0
r3 : 00002502  r2 : c090b2e0  r1 : 00000880  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: a68a8000  DAC: 00000055
Process kworker/0:2 (pid: 711, stack limit = 0x389543bc)
Stack: (0xc6c93f08 to 0xc6c94000)
3f00:                   c7bb86d4 00000000 00000000 c6cbfde0 c7bb86d4 c7ee4200
3f20: 00000000 c0907ea8 00000000 c7bb86d8 c0907ea8 c012077c c6cbfde0 c7bb86d4
3f40: c6cbfde0 c6c92000 c6cbfdf4 c09280ba c0907ea8 c090b2e0 c0907ebc c0120c18
3f60: c6cbfde0 00000000 00000000 c6cbb580 c7ba7c40 c7837edc c6cbb598 00000000
3f80: c6cbfde0 c01208f8 00000000 c01254fc c7ba7c40 c0125400 00000000 00000000
3fa0: 00000000 00000000 00000000 c01010d0 00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c04e33c8>] (mxcmci_datawork) from [<c012077c>] (process_one_work+0x1f0/0x338)
[<c012077c>] (process_one_work) from [<c0120c18>] (worker_thread+0x320/0x474)
[<c0120c18>] (worker_thread) from [<c01254fc>] (kthread+0xfc/0x118)
[<c01254fc>] (kthread) from [<c01010d0>] (ret_from_fork+0x14/0x24)
Exception stack(0xc6c93fb0 to 0xc6c93ff8)
3fa0:                                     00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e3500000 1a000059 e5153050 e5933038 (e48a3004)
---[ end trace 54ca629b75f0e737 ]---
note: kworker/0:2[711] exited with preempt_count 1
Signed-off-by: NAlexander Shiyan <shc_work@mail.ru>
Fixes: b189e758 ("mmc: mxcmmc: handle highmem pages")
Cc: stable@vger.kernel.org
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

65a5c936

mmc: pxamci: fix enum type confusion · 3b687015

由 Arnd Bergmann 提交于 3月 07, 2019

commit e60a582bcde01158a64ff948fb799f21f5d31a11 upstream.

clang points out several instances of mismatched types in this drivers,
all coming from a single declaration:

drivers/mmc/host/pxamci.c:193:15: error: implicit conversion from enumeration type 'enum dma_transfer_direction' to
      different enumeration type 'enum dma_data_direction' [-Werror,-Wenum-conversion]
                direction = DMA_DEV_TO_MEM;
                          ~ ^~~~~~~~~~~~~~
drivers/mmc/host/pxamci.c:212:62: error: implicit conversion from enumeration type 'enum dma_data_direction' to
      different enumeration type 'enum dma_transfer_direction' [-Werror,-Wenum-conversion]
        tx = dmaengine_prep_slave_sg(chan, data->sg, host->dma_len, direction,

The behavior is correct, so this must be a simply typo from
dma_data_direction and dma_transfer_direction being similarly named
types with a similar purpose.

Fixes: 6464b714 ("mmc: pxamci: switch over to dmaengine use")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NRobert Jarzmik <robert.jarzmik@free.fr>
Cc: stable@vger.kernel.org
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3b687015

ALSA: firewire-motu: use 'version' field of unit directory to identify model · 6339cc51

由 Takashi Sakamoto 提交于 3月 17, 2019

commit 2d012c65a9ca26a0ef87ea0a42f1653dd37155f5 upstream.

Current ALSA firewire-motu driver uses the value of 'model' field
of unit directory in configuration ROM for modalias for MOTU
FireWire models. However, as long as I checked, Pre8 and
828mk3(Hybrid) have the same value for the field (=0x100800).

unit            | version   | model
--------------- | --------- | ----------
828mkII         | 0x000003  | 0x101800
Traveler        | 0x000009  | 0x107800
Pre8            | 0x00000f  | 0x100800 <-
828mk3(FW)      | 0x000015  | 0x106800
AudioExpress    | 0x000033  | 0x104800
828mk3(Hybrid)  | 0x000035  | 0x100800 <-

When updating firmware for MOTU 8pre FireWire from v1.0.0 to v1.0.3,
I got change of the value from 0x100800 to 0x103800. On the other
hand, the value of 'version' field is fixed to 0x00000f. As a quick
glance, the higher 12 bits of the value of 'version' field represent
firmware version, while the lower 12 bits is unknown.

By induction, the value of 'version' field represents actual model.

This commit changes modalias to match the value of 'version' field,
instead of 'model' field. For degug, long name of added sound card
includes hexadecimal value of 'model' field.

Fixes: 6c5e1ac0 ("ALSA: firewire-motu: add support for Motu Traveler")
Signed-off-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6339cc51

ALSA: hda - add Lenovo IdeaCentre B550 to the power_save_blacklist · 023a1b28

由 Jaroslav Kysela 提交于 3月 18, 2019

commit 721f1e6c1fd137e7e2053d8e103b666faaa2d50c upstream.

Another machine which does not like the power saving (noise):
  https://bugzilla.redhat.com/show_bug.cgi?id=1689623

Also, reorder the Lenovo C50 entry to keep the table sorted.

Reported-by: hs.guimaraes@outlook.com
Signed-off-by: NJaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

023a1b28

24 3月, 2019 9 次提交

G

Linux 4.19.31 · a2cddfe2
由 Greg Kroah-Hartman 提交于 3月 23, 2019

a2cddfe2

s390/setup: fix boot crash for machine without EDAT-1 · 3053cb97

由 Martin Schwidefsky 提交于 2月 18, 2019

commit 86a86804e4f18fc3880541b3d5a07f4df0fe29cb upstream.

The fix to make WARN work in the early boot code created a problem
on older machines without EDAT-1. The setup_lowcore_dat_on function
uses the pointer from lowcore_ptr[0] to set the DAT bit in the new
PSWs. That does not work if the kernel page table is set up with
4K pages as the prefix address maps to absolute zero.

To make this work the PSWs need to be changed with via address 0 in
form of the S390_lowcore definition.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NCornelia Huck <cohuck@redhat.com>
Fixes: 94f85ed3e2f8 ("s390/setup: fix early warning messages")
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3053cb97

bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata · e578f90d

由 Coly Li 提交于 2月 09, 2019

commit dc7292a5bcb4c878b076fca2ac3fc22f81b8f8df upstream.

In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for
metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
This assumption is not always correct, e.g. XFS uses REQ_META to mark
metadata bio other than REQ_PRIO. This is why Nix noticed that bcache
does not cache metadata for XFS after the above commit.

Thanks to Dave Chinner, he explains the difference between REQ_META and
REQ_PRIO from view of file system developer. Here I quote part of his
explanation from mailing list,
   REQ_META is used for metadata. REQ_PRIO is used to communicate to
   the lower layers that the submitter considers this IO to be more
   important that non REQ_PRIO IO and so dispatch should be expedited.

   IOWs, if the filesystem considers metadata IO to be more important
   that user data IO, then it will use REQ_PRIO | REQ_META rather than
   just REQ_META.

Then it seems bios with REQ_META or REQ_PRIO should both be cached for
performance optimation, because they are all probably low I/O latency
demand by upper layer (e.g. file system).

So in this patch, when we want to decide whether to bypass the cache,
REQ_META and REQ_PRIO are both checked. Then both metadata and
high priority I/O requests will be handled properly.
Reported-by: NNix <nix@esperi.org.uk>
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NAndre Noll <maan@tuebingen.mpg.de>
Tested-by: NNix <nix@esperi.org.uk>
Cc: stable@vger.kernel.org
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e578f90d

KVM: nVMX: Ignore limit checks on VMX instructions using flat segments · 5ffb710b

由 Sean Christopherson 提交于 1月 23, 2019

commit 34333cc6c2cb021662fd32e24e618d1b86de95bf upstream.

Regarding segments with a limit==0xffffffff, the SDM officially states:

When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
or may not cause the indicated exceptions. Behavior is
implementation-specific and may vary from one execution to another.

In practice, all CPUs that support VMX ignore limit checks for "flat
segments", i.e. an expand-up data or code segment with base=0 and
limit=0xffffffff. This is subtly different than wrapping the effective
address calculation based on the address size, as the flat segment
behavior also applies to accesses that would wrap the 4g boundary, e.g.
a 4-byte access starting at 0xffffffff will access linear addresses
0xffffffff, 0x0, 0x1 and 0x2.

Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5ffb710b

KVM: nVMX: Apply addr size mask to effective address for VMX instructions · 29b515c2

由 Sean Christopherson 提交于 1月 23, 2019

commit 8570f9e881e3fde98801bb3a47eef84dd934d405 upstream.

The address size of an instruction affects the effective address, not
the virtual/linear address.  The final address may still be truncated,
e.g. to 32-bits outside of long mode, but that happens irrespective of
the address size, e.g. a 32-bit address size can yield a 64-bit virtual
address when using FS/GS with a non-zero base.

Fixes: 064aea77 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

29b515c2

KVM: nVMX: Sign extend displacements of VMX instr's mem operands · 9ce0ffeb

由 Sean Christopherson 提交于 1月 23, 2019

commit 946c522b603f281195af1df91837a1d4d1eb3bc9 upstream.

The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
operands for various instructions, including VMX instructions, as a
naturally sized unsigned value, but masks the value by the addr size,
e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
reported as 0xffffffd8 for a 32-bit address size. Despite some weird
wording regarding sign extension, the SDM explicitly states that bits
beyond the instructions address size are undefined:

In all cases, bits of this field beyond the instructionâ€™s address
size are undefined.

Failure to sign extend the displacement results in KVM incorrectly
treating a negative displacement as a large positive displacement when
the address size of the VMX instruction is smaller than KVM's native
size, e.g. a 32-bit address size on a 64-bit KVM.

The very original decoding, added by commit 064aea77 ("KVM: nVMX:
Decoding memory operands of VMX instructions"), sort of modeled sign
extension by truncating the final virtual/linear address for a 32-bit
address size. I.e. it messed up the effective address but made it work
by adjusting the final address.

When segmentation checks were added, the truncation logic was kept
as-is and no sign extension logic was introduced. In other words, it
kept calculating the wrong effective address while mostly generating
the correct virtual/linear address. As the effective address is what's
used in the segment limit checks, this results in KVM incorreclty
injecting #GP/#SS faults due to non-existent segment violations when
a nested VMM uses negative displacements with an address size smaller
than KVM's native address size.

Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
KVM using 0x100000fd8 as the effective address when checking for a
segment limit violation. This causes a 100% failure rate when running
a 32-bit KVM build as L1 on top of a 64-bit KVM L0.

9ce0ffeb

KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux · c235af5a

由 Sean Christopherson 提交于 2月 05, 2019

commit ddfd1730fd829743e41213e32ccc8b4aa6dc8325 upstream.

When installing new memslots, KVM sets bit 0 of the generation number to
indicate that an update is in-progress.  Until the update is complete,
there are no guarantees as to whether a vCPU will see the old or the new
memslots.  Explicity prevent caching MMIO accesses so as to avoid using
an access cached from the old memslots after the new memslots have been
installed.

Note that it is unclear whether or not disabling caching during the
update window is strictly necessary as there is no definitive
documentation as to what ordering guarantees KVM provides with respect
to updating memslots.  That being said, the MMIO spte code does not
allow reusing sptes created while an update is in-progress, and the
associated documentation explicitly states:

    We do not want to use an MMIO sptes created with an odd generation
    number, ...  If KVM is unlucky and creates an MMIO spte while the
    low bit is 1, the next access to the spte will always be a cache miss.

At the very least, disabling the per-vCPU MMIO cache during updates will
make its behavior consistent with the MMIO spte behavior and
documentation.

Fixes: 56f17dd3 ("kvm: x86: fix stale mmio cache bug")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c235af5a

KVM: x86/mmu: Detect MMIO generation wrap in any address space · 656e9e5d

由 Sean Christopherson 提交于 2月 05, 2019

commit e1359e2beb8b0a1188abc997273acbaedc8ee791 upstream.

The check to detect a wrap of the MMIO generation explicitly looks for a
generation number of zero.  Now that unique memslots generation numbers
are assigned to each address space, only address space 0 will get a
generation number of exactly zero when wrapping.  E.g. when address
space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
wrap to 0x2.  Adjust the MMIO generation to strip the address space
modifier prior to checking for a wrap.

Fixes: 4bd518f1 ("KVM: use separate generations for each address space")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

656e9e5d

KVM: Call kvm_arch_memslots_updated() before updating memslots · 23ad135a

由 Sean Christopherson 提交于 2月 05, 2019

commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.

kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.

Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.

Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

23ad135a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功