1. 26 6月, 2020 5 次提交
    • L
      Merge tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 42e9c85f
      Linus Torvalds 提交于
      Pull tracing fixes from Steven Rostedt:
       "Four small fixes:
      
         - Fix a ringbuffer bug for nested events having time go backwards
      
         - Fix a config dependency for boot time tracing to depend on
           synthetic events instead of histograms.
      
         - Fix trigger format parsing to handle multiple spaces
      
         - Fix bootconfig to handle failures in multiple events"
      
      * tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/boottime: Fix kprobe multiple events
        tracing: Fix event trigger to accept redundant spaces
        tracing/boot: Fix config dependency for synthedic event
        ring-buffer: Zero out time extend if it is nested and not absolute
      42e9c85f
    • L
      Merge tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 52366a10
      Linus Torvalds 提交于
      Pull fsnotify fixlet from Jan Kara:
       "A performance improvement to reduce impact of fsnotify for inodes
        where it isn't used"
      
      * tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fs: Do not check if there is a fsnotify watcher on pseudo inodes
      52366a10
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 87d93e9a
      Linus Torvalds 提交于
      Pull rdma fixes from Jason Gunthorpe:
       "Several regression fixes from work that landed in the merge window,
        particularly in the mlx5 driver:
      
         - Various static checker and warning fixes
      
         - General bug fixes in rvt, qedr, hns, mlx5 and hfi1
      
         - Several regression fixes related to the ECE and QP changes in last
           cycle
      
         - Fixes for a few long standing crashers in CMA, uverbs ioctl, and
           xrc"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
        IB/hfi1: Add atomic triggered sleep/wakeup
        IB/hfi1: Correct -EBUSY handling in tx code
        IB/hfi1: Fix module use count flaw due to leftover module put calls
        IB/hfi1: Restore kfree in dummy_netdev cleanup
        IB/mad: Fix use after free when destroying MAD agent
        RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata
        RDMA/counter: Query a counter before release
        RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()
        RDMA/mlx5: Fix integrity enabled QP creation
        RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
        RDMA/mlx5: Fix remote gid value in query QP
        RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
        RDMA/core: Check that type_attrs is not NULL prior access
        RDMA/hns: Fix an cmd queue issue when resetting
        RDMA/hns: Fix a calltrace when registering MR from userspace
        RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
        RDMA/cma: Protect bind_list and listen_list while finding matching cm id
        RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532
        RDMA/efa: Set maximum pkeys device attribute
        RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq
        ...
      87d93e9a
    • L
      Merge tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 908f7d12
      Linus Torvalds 提交于
      Pull s390 fixes from Heiko Carstens:
      
       - Fix kernel crash on system call single stepping.
      
       - Make sure early program check handler is executed with DAT on to
         avoid an endless program check loop.
      
       - Add __GFP_NOWARN flag to debug feature to avoid user triggerable
         allocation failure messages.
      
      * tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/debug: avoid kernel warning on too large number of pages
        s390/kasan: fix early pgm check handler execution
        s390: fix system call single stepping
      908f7d12
    • L
      Merge tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · a4d3712b
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes gathered in the last two weeks.
      
        The major changes here are fixes for the recent DPCM regressions found
        on i.MX and Qualcomm platforms and fixes for resource leaks in ASoC
        DAI registrations.
      
        Other than those are mostly device-specific fixes including the usual
        USB- and HD-audio quirks, and a fix for syzkaller case and ID updates
        for new Intel platforms"
      
      * tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
        ALSA: usb-audio: Fix OOB access of mixer element list
        ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG)
        ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S
        ASoC: rockchip: Fix a reference count leak.
        ASoC: amd: closing specific instance.
        ALSA: hda: Intel: add missing PCI IDs for ICL-H, TGL-H and EKL
        ASoC: hdac_hda: fix memleak with regmap not freed on remove
        ASoC: SOF: Intel: add PCI IDs for ICL-H and TGL-H
        ASoC: SOF: Intel: add PCI ID for CometLake-S
        ASoC: Intel: SOF: merge COMETLAKE_LP and COMETLAKE_H
        ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems
        ALSA: usb-audio: Fix potential use-after-free of streams
        ALSA: hda/realtek - Add quirk for MSI GE63 laptop
        ASoC: fsl_ssi: Fix bclk calculation for mono channel
        ASoC: SOF: Intel: hda: Clear RIRB status before reading WP
        ASoC: rt1015: Update rt1015 default register value according to spec modification.
        ASoC: qcom: common: set correct directions for dailinks
        ASoc: q6afe: add support to get port direction
        ASoC: soc-pcm: fix checks for multi-cpu FE dailinks
        ASoC: rt5682: Let dai clks be registered whether mclk exists or not
        ...
      a4d3712b
  2. 25 6月, 2020 7 次提交
    • L
      Merge tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 8be3a53e
      Linus Torvalds 提交于
      Pull erofs fix from Gao Xiang:
       "Fix a regression which uses potential uninitialized high 32-bit value
        unexpectedly recently observed with specific compiler options"
      
      * tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
      8be3a53e
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · fc10807d
      Linus Torvalds 提交于
      Pull virtio fixes from Michael Tsirkin:
       "Fixes all over the place.
      
        This includes a couple of tests that I would normally defer, but since
        they have already been helpful in catching some bugs, don't build for
        any users at all, and having them upstream makes life easier for
        everyone, I think it's ok even at this late stage"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        tools/virtio: Use tools/include/list.h instead of stubs
        tools/virtio: Reset index in virtio_test --reset.
        tools/virtio: Extract virtqueue initialization in vq_reset
        tools/virtio: Use __vring_new_virtqueue in virtio_test.c
        tools/virtio: Add --reset
        tools/virtio: Add --batch=random option
        tools/virtio: Add --batch option
        virtio-mem: add memory via add_memory_driver_managed()
        virtio-mem: silence a static checker warning
        vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap()
        vdpa: fix typos in the comments for __vdpa_alloc_device()
      fc10807d
    • L
      Merge tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · fbb58011
      Linus Torvalds 提交于
      Pull thread fix from Christian Brauner:
       "This fixes a regression introduced with 303cc571 ("nsproxy: attach
        to namespaces via pidfds").
      
        The LTP testsuite reported a regression where users would now see
        EBADF returned instead of EINVAL when an fd was passed that referred
        to an open file but the file was not a namespace file.
      
        Fix this by continuing to report EINVAL and add a regression test"
      
      * tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        tests: test for setns() EINVAL regression
        nsproxy: restore EINVAL for non-namespace file descriptor
      fbb58011
    • M
      IB/hfi1: Add atomic triggered sleep/wakeup · 38fd98af
      Mike Marciniszyn 提交于
      When running iperf in a two host configuration the following trace can
      occur:
      
      [  319.728730] NETDEV WATCHDOG: ib0 (hfi1): transmit queue 0 timed out
      
      The issue happens because the current implementation relies on the netif
      txq being stopped to control the flushing of the tx list.
      
      There are two resources that the transmit logic can wait on and stop the
      txq:
      - SDMA descriptors
      - Ring space to hold completions
      
      The ring space is tested on the sending side and relieved when the ring is
      consumed in the napi tx reaping.
      
      Unfortunately, that reaping can run conncurrently with the workqueue
      flushing of the txlist.  If the txq is started just before the workitem
      executes, the txlist will never be flushed, leading to the txq being
      stuck.
      
      Fix by:
      - Adding sleep/wakeup wrappers
        * Use an atomic to control the call to the netif routines inside the
          wrappers
      
      - Use another atomic to record ring space exhaustion
        * Only wakeup when the a ring space exhaustion has happened and it
          relieved
      
      Add additional wrappers to clarify the ring space resource handling.
      
      Fixes: d99dc602 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
      Link: https://lore.kernel.org/r/20200623204327.108092.4024.stgit@awfm-01.aw.intel.comReviewed-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      38fd98af
    • M
      IB/hfi1: Correct -EBUSY handling in tx code · 82172b76
      Mike Marciniszyn 提交于
      The current code mishandles -EBUSY in two ways:
      - The flow change doesn't test the return from the flush and runs on to
        process the current packet racing with the wakeup processing
      - The -EBUSY handling for a single packet inserts the tx into the txlist
        after the submit call, racing with the same wakeup processing
      
      Fix the first by dropping the skb and returning NETDEV_TX_OK.
      
      Fix the second by insuring the the list entry within the txreq is inited
      when allocated.  This enables the sleep routine to detect that the txreq
      has used the non-list api and queue the packet to the txlist.
      
      Both flaws can lead to having the flushing thread executing in causing two
      threads to manipulate the txlist.
      
      Fixes: d99dc602 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
      Link: https://lore.kernel.org/r/20200623204321.108092.83898.stgit@awfm-01.aw.intel.comReviewed-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      82172b76
    • D
      IB/hfi1: Fix module use count flaw due to leftover module put calls · 822fbd37
      Dennis Dalessandro 提交于
      When the try_module_get calls were removed from opening and closing of the
      i2c debugfs file, the corresponding module_put calls were missed.  This
      results in an inaccurate module use count that requires a power cycle to
      fix.
      
      Fixes: 09fbca8e ("IB/hfi1: No need to use try_module_get for debugfs")
      Link: https://lore.kernel.org/r/20200623203230.106975.76240.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NKaike Wan <kaike.wan@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      822fbd37
    • D
      IB/hfi1: Restore kfree in dummy_netdev cleanup · b46925a2
      Dennis Dalessandro 提交于
      We need to do some rework on the dummy netdev. Calling the free_netdev()
      would normally make sense, and that will be addressed in an upcoming
      patch. For now just revert the behavior to what it was before keeping the
      unused variable removal part of the patch.
      
      The dd->dumm_netdev is mainly used for packet receiving through
      alloc_netdev_mqs() for typical net devices. A a result, it should be freed
      with kfree instead of free_netdev() that leads to a crash when unloading
      the hfi1 module:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 8000000855b54067 P4D 8000000855b54067 PUD 84a4f5067 PMD 0
        Oops: 0000 [#1] SMP PTI
        CPU: 73 PID: 10299 Comm: modprobe Not tainted 5.6.0-rc5+ #1
        Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
        RIP: 0010:__hw_addr_flush+0x12/0x80
        Code: 40 00 48 83 c4 08 4c 89 e7 5b 5d 41 5c e9 76 77 18 00 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 1f 48 39 df <48> 8b 2b 75 08 eb 4a 48 89 eb 48 89 c5 48 89 df e8 99 bf d0 ff 84
        RSP: 0018:ffffb40e08783db8 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
        RDX: ffffb40e00000000 RSI: 0000000000000246 RDI: ffff88ab13662298
        RBP: ffff88ab13662000 R08: 0000000000001549 R09: 0000000000001549
        R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff88ab13662298
        R13: ffff88ab1b259e20 R14: ffff88ab1b259e42 R15: 0000000000000000
        FS:  00007fb39b534740(0000) GS:ffff88b31f940000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000084d3ea004 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         dev_addr_flush+0x15/0x30
         free_netdev+0x7e/0x130
         hfi1_netdev_free+0x59/0x70 [hfi1]
         remove_one+0x65/0x110 [hfi1]
         pci_device_remove+0x3b/0xc0
         device_release_driver_internal+0xec/0x1b0
         driver_detach+0x46/0x90
         bus_remove_driver+0x58/0xd0
         pci_unregister_driver+0x26/0xa0
         hfi1_mod_cleanup+0xc/0xd54 [hfi1]
         __x64_sys_delete_module+0x16c/0x260
         ? exit_to_usermode_loop+0xa4/0xc0
         do_syscall_64+0x5b/0x200
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 193ba031 ("IB/hfi1: Use free_netdev() in hfi1_netdev_free()")
      Link: https://lore.kernel.org/r/20200623203224.106975.16926.stgit@awfm-01.aw.intel.comReviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      b46925a2
  3. 24 6月, 2020 7 次提交
  4. 23 6月, 2020 21 次提交
    • S
      ring-buffer: Zero out time extend if it is nested and not absolute · 097350d1
      Steven Rostedt (VMware) 提交于
      Currently the ring buffer makes events that happen in interrupts that preempt
      another event have a delta of zero. (Hopefully we can change this soon). But
      this is to deal with the races of updating a global counter with lockless
      and nesting functions updating deltas.
      
      With the addition of absolute time stamps, the time extend didn't follow
      this rule. A time extend can happen if two events happen longer than 2^27
      nanoseconds appart, as the delta time field in each event is only 27 bits.
      If that happens, then a time extend is injected with 2^59 bits of
      nanoseconds to use (18 years). But if the 2^27 nanoseconds happen between
      two events, and as it is writing the event, an interrupt triggers, it will
      see the 2^27 difference as well and inject a time extend of its own. But a
      recent change made the time extend logic not take into account the nesting,
      and this can cause two time extend deltas to happen moving the time stamp
      much further ahead than the current time. This gets all reset when the ring
      buffer moves to the next page, but that can cause time to appear to go
      backwards.
      
      This was observed in a trace-cmd recording, and since the data is saved in a
      file, with trace-cmd report --debug, it was possible to see that this indeed
      did happen!
      
        bash-52501   110d... 81778.908247: sched_switch:         bash:52501 [120] S ==> swapper/110:0 [120] [12770284:0x2e8:64]
        <idle>-0     110d... 81778.908757: sched_switch:         swapper/110:0 [120] R ==> bash:52501 [120] [509947:0x32c:64]
       TIME EXTEND: delta:306454770 length:0
        bash-52501   110.... 81779.215212: sched_swap_numa:      src_pid=52501 src_tgid=52388 src_ngid=52501 src_cpu=110 src_nid=2 dst_pid=52509 dst_tgid=52388 dst_ngid=52501 dst_cpu=49 dst_nid=1 [0:0x378:48]
       TIME EXTEND: delta:306458165 length:0
        bash-52501   110dNh. 81779.521670: sched_wakeup:         migration/110:565 [0] success=1 CPU:110 [0:0x3b4:40]
      
      and at the next page, caused the time to go backwards:
      
        bash-52504   110d... 81779.685411: sched_switch:         bash:52504 [120] S ==> swapper/110:0 [120] [8347057:0xfb4:64]
      CPU:110 [SUBBUFFER START] [81779379165886:0x1320000]
        <idle>-0     110dN.. 81779.379166: sched_wakeup:         bash:52504 [120] success=1 CPU:110 [0:0x10:40]
        <idle>-0     110d... 81779.379167: sched_switch:         swapper/110:0 [120] R ==> bash:52504 [120] [1168:0x3c:64]
      
      Link: https://lkml.kernel.org/r/20200622151815.345d1bf5@oasis.local.home
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: dc4e2801 ("ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMP")
      Reported-by: NJulia Lawall <julia.lawall@inria.fr>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      097350d1
    • M
      ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG) · a32a1fc9
      Macpaul Lin 提交于
      We've found Samsung USBC Headset (AKG) (VID: 0x04e8, PID: 0xa051)
      need a tiny delay after each class compliant request.
      Otherwise the device might not be able to be recognized each times.
      Signed-off-by: NChihhao Chen <chihhao.chen@mediatek.com>
      Signed-off-by: NMacpaul Lin <macpaul.lin@mediatek.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/1592910203-24035-1-git-send-email-macpaul.lin@mediatek.comSigned-off-by: NTakashi Iwai <tiwai@suse.de>
      a32a1fc9
    • C
      s390/debug: avoid kernel warning on too large number of pages · 827c4913
      Christian Borntraeger 提交于
      When specifying insanely large debug buffers a kernel warning is
      printed. The debug code does handle the error gracefully, though.
      Instead of duplicating the check let us silence the warning to
      avoid crashes when panic_on_warn is used.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      827c4913
    • V
      s390/kasan: fix early pgm check handler execution · 998f5bbe
      Vasily Gorbik 提交于
      Currently if early_pgm_check_handler is called it ends up in pgm check
      loop. The problem is that early_pgm_check_handler is instrumented by
      KASAN but executed without DAT flag enabled which leads to addressing
      exception when KASAN checks try to access shadow memory.
      
      Fix that by executing early handlers with DAT flag on under KASAN as
      expected.
      Reported-and-tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      998f5bbe
    • S
      s390: fix system call single stepping · e64a1618
      Sven Schnelle 提交于
      When single stepping an svc instruction on s390, the kernel is entered
      with a PER program check interruption. The program check handler than
      jumps to the system call handler by reloading the PSW. The code didn't
      set GPR13 to the thread pointer in struct task_struct. This made the
      kernel access invalid memory while trying to fetch the syscall function
      address. Fix this by always assigned GPR13 after .Lsysc_per.
      
      Fixes: 0b0ed657 ("s390: remove critical section cleanup from entry.S")
      Reported-and-tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NSven Schnelle <svens@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      e64a1618
    • C
      ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S · 73094608
      Christoffer Nielsen 提交于
      Similar to the Kingston HyperX AMP, the Kingston HyperX Cloud
      Alpha S (0951:0x16ea) uses two interfaces, but only the second
      interface contains the capture stream. This patch delays the
      registration until the second interface appears.
      Signed-off-by: NChristoffer Nielsen <cn@obviux.dk>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/CAOtG2YHOM3zy+ed9KS-J4HkZo_QGzcUG9MigSp4e4_-13r6B=Q@mail.gmail.comSigned-off-by: NTakashi Iwai <tiwai@suse.de>
      73094608
    • S
      KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru · e4553b49
      Sean Christopherson 提交于
      Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was
      moved to common x86 code.
      
      No functional change intended.
      
      Fixes: 37486135 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e4553b49
    • M
      KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling · 26769f96
      Marcelo Tosatti 提交于
      The Linux TSC calibration procedure is subject to small variations
      (its common to see +-1 kHz difference between reboots on a given CPU, for example).
      
      So migrating a guest between two hosts with identical processor can fail, in case
      of a small variation in calibrated TSC between them.
      
      Without TSC scaling, the current kernel interface will either return an error
      (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode.
      
      This change enables the following TSC tolerance check to
      accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default).
      
              /*
               * Compute the variation in TSC rate which is acceptable
               * within the range of tolerance and decide if the
               * rate being applied is within that bounds of the hardware
               * rate.  If so, no scaling or compensation need be done.
               */
              thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
              thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
              if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
                      pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
                      use_scaling = 1;
              }
      
      NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm).
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      
      Message-Id: <20200616114741.GA298183@fuller.cnet>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      26769f96
    • X
      KVM: X86: Fix MSR range of APIC registers in X2APIC mode · bf10bd0b
      Xiaoyao Li 提交于
      Only MSR address range 0x800 through 0x8ff is architecturally reserved
      and dedicated for accessing APIC registers in x2APIC mode.
      
      Fixes: 0105d1a5 ("KVM: x2apic interface to lapic")
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf10bd0b
    • S
      KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL · bf09fb6c
      Sean Christopherson 提交于
      Remove support for context switching between the guest's and host's
      desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
      required for correct functionality, e.g. KVM intercepts reads and writes
      to the MSR, and the latency effects of the settings controlled by the
      MSR are not architecturally visible.
      
      As a general rule, KVM should not allow the guest to control power
      management settings unless explicitly enabled by userspace, e.g. see
      KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
      can improve the performance of SMT siblings.  A devious guest could
      disable C0.2 so as to improve the performance of their workloads at the
      detriment to workloads running in the host or on other VMs.
      
      Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
      condition where updates from the host may cause KVM to enter the guest
      with the incorrect value.  Because updates are are propagated to all
      CPUs via IPI (SMP function callback), the value in hardware may be
      stale with respect to the cached value and KVM could enter the guest
      with the wrong value in hardware.  As above, the guest can't observe the
      bad value, but it's a weird and confusing wart in the implementation.
      
      Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
      lists.  Using the lists is only necessary for MSRs that are required for
      correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
      old hardware, or for MSRs that need to-the-uop precision, e.g. perf
      related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
      kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
      vcpu_vmx_run().  Using the atomic lists is undesirable as they are more
      expensive than direct RDMSR/WRMSR.
      
      Furthermore, even if giving the guest control of the MSR is legitimate,
      e.g. in pass-through scenarios, it's not clear that the benefits would
      outweigh the overhead.  E.g. saving and restoring an MSR across a VMX
      roundtrip costs ~250 cycles, and if the guest diverged from the host
      that cost would be paid on every run of the guest.  In other words, if
      there is a legitimate use case then it should be enabled by a new
      per-VM capability.
      
      Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
      correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
      UMWAIT and UMONITOR.
      
      Fixes: 6e3ba4ab ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
      Cc: stable@vger.kernel.org
      Cc: Jingqi Liu <jingqi.liu@intel.com>
      Cc: Tao Xu <tao3.xu@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf09fb6c
    • S
      KVM: nVMX: Plumb L2 GPA through to PML emulation · 2dbebf7a
      Sean Christopherson 提交于
      Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all
      intents and purposes is vmx_write_pml_buffer(), instead of having the
      latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS.  If the dirty bit
      update is the result of KVM emulation (rare for L2), then the GPA in the
      VMCS may be stale and/or hold a completely unrelated GPA.
      
      Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2dbebf7a
    • S
      IB/mad: Fix use after free when destroying MAD agent · 116a1b9f
      Shay Drory 提交于
      Currently, when RMPP MADs are processed while the MAD agent is destroyed,
      it could result in use after free of rmpp_recv, as decribed below:
      
      	cpu-0						cpu-1
      	-----						-----
      ib_mad_recv_done()
       ib_mad_complete_recv()
        ib_process_rmpp_recv_wc()
      						unregister_mad_agent()
      						 ib_cancel_rmpp_recvs()
      						  cancel_delayed_work()
         process_rmpp_data()
          start_rmpp()
           queue_delayed_work(rmpp_recv->cleanup_work)
      						  destroy_rmpp_recv()
      						   free_rmpp_recv()
           cleanup_work()[1]
            spin_lock_irqsave(&rmpp_recv->agent->lock) <-- use after free
      
      [1] cleanup_work() == recv_cleanup_handler
      
      Fix it by waiting for the MAD agent reference count becoming zero before
      calling to ib_cancel_rmpp_recvs().
      
      Fixes: 9a41e38a ("IB/mad: Use IDR for agent IDs")
      Link: https://lore.kernel.org/r/20200621104738.54850-2-leon@kernel.orgSigned-off-by: NShay Drory <shayd@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      116a1b9f
    • L
      RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata · 6eefa839
      Leon Romanovsky 提交于
      Don't deref udata if it is NULL
      
        BUG: kernel NULL pointer dereference, address: 0000000000000030
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000   SMP PTI
        CPU: 2 PID: 1592 Comm: python3 Not tainted 5.7.0-rc6+ #1
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
        RIP: 0010:create_qp+0x39e/0xae0 [mlx5_ib]
        Code: c0 0d 00 00 bf 10 01 00 00 e8 be a9 e4 e0 48 85 c0 49 89 c2 0f 84 0c 07 00 00 41 8b 85 74 63 01 00 0f c8 a9 00 00 00 10 74 0a <41> 8b 46 30 0f c8 41 89 42 14 41 8b 52 18 41 0f b6 4a 1c 0f ca 89
        RSP: 0018:ffffc9000067f8b0 EFLAGS: 00010206
        RAX: 0000000010170000 RBX: ffff888441313000 RCX: 0000000000000000
        RDX: 0000000000000200 RSI: 0000000000000000 RDI: ffff88845b1d4400
        RBP: ffffc9000067fa60 R08: 0000000000000200 R09: ffff88845b1d4200
        R10: ffff88845b1d4200 R11: ffff888441313000 R12: ffffc9000067f950
        R13: ffff88846ac00140 R14: 0000000000000000 R15: ffff88846c2bc000
        FS:  00007faa1a3c0540(0000) GS:ffff88846fd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000030 CR3: 0000000446dca003 CR4: 0000000000760ea0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         mlx5_ib_create_qp+0x897/0xfa0 [mlx5_ib]
         ib_create_qp+0x9e/0x300 [ib_core]
         create_qp+0x92d/0xb20 [ib_uverbs]
         ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs]
         ? release_resource+0x30/0x30
         ib_uverbs_create_qp+0xc4/0xe0 [ib_uverbs]
         ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc8/0xf0 [ib_uverbs]
         ib_uverbs_run_method+0x223/0x770 [ib_uverbs]
         ? track_pfn_remap+0xa7/0x100
         ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
         ? remap_pfn_range+0x358/0x490
         ib_uverbs_cmd_verbs.isra.6+0x19b/0x370 [ib_uverbs]
         ? rdma_umap_priv_init+0x82/0xe0 [ib_core]
         ? vm_mmap_pgoff+0xec/0x120
         ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
         ksys_ioctl+0x92/0xb0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x48/0x130
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: e383085c ("RDMA/mlx5: Set ECE options during QP create")
      Link: https://lore.kernel.org/r/20200621115959.60126-1-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6eefa839
    • V
      KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic() · 312d16c7
      Vitaly Kuznetsov 提交于
      translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously
      wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and
      we don't use the value in 'real_gfn' as a GFN, we do
      
       real_gfn = gpa_to_gfn(real_gfn);
      
      instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust
      your eyes', but let's fix it for good.
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200622151435.752560-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      312d16c7
    • P
      KVM: LAPIC: ensure APIC map is up to date on concurrent update requests · 44d52717
      Paolo Bonzini 提交于
      The following race can cause lost map update events:
      
               cpu1                            cpu2
      
                                      apic_map_dirty = true
        ------------------------------------------------------------
                                      kvm_recalculate_apic_map:
                                           pass check
                                               mutex_lock(&kvm->arch.apic_map_lock);
                                               if (!kvm->arch.apic_map_dirty)
                                           and in process of updating map
        -------------------------------------------------------------
          other calls to
             apic_map_dirty = true         might be too late for affected cpu
        -------------------------------------------------------------
                                           apic_map_dirty = false
        -------------------------------------------------------------
          kvm_recalculate_apic_map:
          bail out on
            if (!kvm->arch.apic_map_dirty)
      
      To fix it, record the beginning of an update of the APIC map in
      apic_map_dirty.  If another APIC map change switches apic_map_dirty
      back to DIRTY during the update, kvm_recalculate_apic_map should not
      make it CLEAN, and the other caller will go through the slow path.
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      44d52717
    • M
      RDMA/counter: Query a counter before release · c1d869d6
      Mark Zhang 提交于
      Query a dynamically-allocated counter before release it, to update it's
      hwcounters and log all of them into history data. Otherwise all values of
      these hwcounters will be lost.
      
      Fixes: f34a55e4 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
      Link: https://lore.kernel.org/r/20200621110000.56059-1-leon@kernel.orgSigned-off-by: NMark Zhang <markz@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c1d869d6
    • L
      Merge tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · dd0d7181
      Linus Torvalds 提交于
      Pull spi fixes from Mark Brown:
       "Quite a lot of fixes here for no single reason.
      
        There's a collection of the usual sort of device specific fixes and
        also a bunch of people have been working on spidev and the userspace
        test program spidev_test so they've got an unusually large collection
        of small fixes"
      
      * tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spidev: fix a potential use-after-free in spidev_release()
        spi: spidev: fix a race between spidev_release and spidev_remove
        spi: stm32-qspi: Fix error path in case of -EPROBE_DEFER
        spi: uapi: spidev: Use TABs for alignment
        spi: spi-fsl-dspi: Free DMA memory with matching function
        spi: tools: Add macro definitions to fix build errors
        spi: tools: Make default_tx/rx and input_tx static
        spi: dt-bindings: amlogic, meson-gx-spicc: Fix schema for meson-g12a
        spi: rspi: Use requested instead of maximum bit rate
        spi: spidev_test: Use %u to format unsigned numbers
        spi: sprd: switch the sequence of setting WDG_LOAD_LOW and _HIGH
      dd0d7181
    • I
      kvm: lapic: fix broken vcpu hotplug · af28dfac
      Igor Mammedov 提交于
      Guest fails to online hotplugged CPU with error
        smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
      
      It's caused by the fact that kvm_apic_set_state(), which used to call
      recalculate_apic_map() unconditionally and pulled hotplugged CPU into
      apic map, is updating map conditionally on state changes.  In this case
      the APIC map is not considered dirty and the is not updated.
      
      Fix the issue by forcing unconditional update from kvm_apic_set_state(),
      like it used to be.
      
      Fixes: 4abaffce ("KVM: LAPIC: Recalculate apic map in batch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Message-Id: <20200622160830.426022-1-imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af28dfac
    • L
      Merge tag 'regulator-fix-v5.8-rc2' of... · 75164578
      Linus Torvalds 提交于
      Merge tag 'regulator-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "This has a fix for the refactoring out of the pickable ranges
        functionality, plus the removal of a BROKEN dependency on mt6358 now
        that the dependencies were merged in -rc1 and a couple of device
        specific fixes"
      
      * tag 'regulator-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: mt6358: Remove BROKEN dependency
        regualtor: pfuze100: correct sw1a/sw2 on pfuze3000
        regulator: Fix pickable ranges mapping
        regulator: da9063: fix LDO9 suspend and warning.
      75164578
    • L
      Merge tag 'regmap-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 2a000870
      Linus Torvalds 提交于
      Pull regmap fixes from Mark Brown:
       "A few small fixes, none of which are likely to have any substantial
        impact here - the most substantial one is a fix for a long standing
        memory leak on devices that use register patching which will only have
        an impact if the device is removed and re-added"
      
      * tag 'regmap-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: Fix memory leak from regmap_register_patch
        regmap: fix the kerneldoc for regmap_test_bits()
        regmap: fix alignment issue
      2a000870
    • E
      tools/virtio: Use tools/include/list.h instead of stubs · cb91909e
      Eugenio Pérez 提交于
      It should not make any significant difference but reduce stub code.
      Signed-off-by: NEugenio Pérez <eperezma@redhat.com>
      Link: https://lore.kernel.org/r/20200418102217.32327-9-eperezma@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
      cb91909e