1. 19 4月, 2019 3 次提交
    • A
      mlxsw: spectrum: Fix autoneg status in ethtool · 151f0ddd
      Amit Cohen 提交于
      If link is down and autoneg is set to on/off, the status in ethtool does
      not change.
      
      The reason is when the link is down the function returns with zero
      before changing autoneg value.
      
      Move the checking of link state (up/down) to be performed after setting
      autoneg value, in order to be sure that autoneg will change in any case.
      
      Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
      Signed-off-by: NAmit Cohen <amitc@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      151f0ddd
    • I
      mlxsw: pci: Reincrease PCI reset timeout · 1ab30301
      Ido Schimmel 提交于
      During driver initialization the driver sends a reset to the device and
      waits for the firmware to signal that it is ready to continue.
      
      Commit d2f372ba ("mlxsw: pci: Increase PCI SW reset timeout")
      increased the timeout to 13 seconds due to longer PHY calibration in
      Spectrum-2 compared to Spectrum-1.
      
      Recently it became apparent that this timeout is too short and therefore
      this patch increases it again to a safer limit that will be reduced in
      the future.
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Fixes: d2f372ba ("mlxsw: pci: Increase PCI SW reset timeout")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ab30301
    • P
      mlxsw: spectrum: Put MC TCs into DWRR mode · f476b3f8
      Petr Machata 提交于
      Both Spectrum-1 and Spectrum-2 chips are currently configured such that
      pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
      for MC traffic) are feeding into the same subgroup. Strict
      prioritization is configured between the two TCs, and by enabling
      MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
      over the higher-numbered (MC) TCs.
      
      On Spectrum-2 however, there is an issue in configuration of the
      MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
      To work around the issue, configure the MC TCs with DWRR mode (while
      keeping the UC TCs in strict mode).
      
      With this patch, the multicast-unicast arbitration results in the same
      behavior on both Spectrum-1 and Spectrum-2 chips.
      
      Fixes: 7b819530 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f476b3f8
  2. 18 4月, 2019 6 次提交
    • A
      s390: ctcm: fix ctcm_new_device error return code · 27b141fc
      Arnd Bergmann 提交于
      clang points out that the return code from this function is
      undefined for one of the error paths:
      
      ../drivers/s390/net/ctcm_main.c:1595:7: warning: variable 'result' is used uninitialized whenever 'if' condition is true
            [-Wsometimes-uninitialized]
                      if (priv->channel[direction] == NULL) {
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ../drivers/s390/net/ctcm_main.c:1638:9: note: uninitialized use occurs here
              return result;
                     ^~~~~~
      ../drivers/s390/net/ctcm_main.c:1595:3: note: remove the 'if' if its condition is always false
                      if (priv->channel[direction] == NULL) {
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ../drivers/s390/net/ctcm_main.c:1539:12: note: initialize the variable 'result' to silence this warning
              int result;
                        ^
      
      Make it return -ENODEV here, as in the related failure cases.
      gcc has a known bug in underreporting some of these warnings
      when it has already eliminated the assignment of the return code
      based on some earlier optimization step.
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27b141fc
    • C
      nfp: abm: fix spelling mistake "offseting" -> "offsetting" · d003d772
      Colin Ian King 提交于
      There are a couple of spelling mistakes in NL_SET_ERR_MSG_MOD error
      messages. Fix these.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d003d772
    • Y
      net: stmmac: Use bfsize1 in ndesc_init_rx_desc · f87db4db
      YueHaibing 提交于
      gcc warn this:
      
      drivers/net/ethernet/stmicro/stmmac/norm_desc.c: In function ndesc_init_rx_desc:
      drivers/net/ethernet/stmicro/stmmac/norm_desc.c:138:6: warning: variable 'bfsize1' set but not used [-Wunused-but-set-variable]
      
      Like enh_desc_init_rx_desc, we should use bfsize1
      in ndesc_init_rx_desc to calculate 'p->des1'
      
      Fixes: 583e6361 ("net: stmmac: use correct DMA buffer size in the RX descriptor")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f87db4db
    • Z
      ipv4: set the tcp_min_rtt_wlen range from 0 to one day · 19fad20d
      ZhangXiaoxu 提交于
      There is a UBSAN report as below:
      UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
      signed integer overflow:
      2147483647 * 1000 cannot be represented in type 'int'
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e3 #1
      Call Trace:
       <IRQ>
       dump_stack+0x8c/0xba
       ubsan_epilogue+0x11/0x60
       handle_overflow+0x12d/0x170
       ? ttwu_do_wakeup+0x21/0x320
       __ubsan_handle_mul_overflow+0x12/0x20
       tcp_ack_update_rtt+0x76c/0x780
       tcp_clean_rtx_queue+0x499/0x14d0
       tcp_ack+0x69e/0x1240
       ? __wake_up_sync_key+0x2c/0x50
       ? update_group_capacity+0x50/0x680
       tcp_rcv_established+0x4e2/0xe10
       tcp_v4_do_rcv+0x22b/0x420
       tcp_v4_rcv+0xfe8/0x1190
       ip_protocol_deliver_rcu+0x36/0x180
       ip_local_deliver+0x15b/0x1a0
       ip_rcv+0xac/0xd0
       __netif_receive_skb_one_core+0x7f/0xb0
       __netif_receive_skb+0x33/0xc0
       netif_receive_skb_internal+0x84/0x1c0
       napi_gro_receive+0x2a0/0x300
       receive_buf+0x3d4/0x2350
       ? detach_buf_split+0x159/0x390
       virtnet_poll+0x198/0x840
       ? reweight_entity+0x243/0x4b0
       net_rx_action+0x25c/0x770
       __do_softirq+0x19b/0x66d
       irq_exit+0x1eb/0x230
       do_IRQ+0x7a/0x150
       common_interrupt+0xf/0xf
       </IRQ>
      
      It can be reproduced by:
        echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
      
      Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19fad20d
    • L
      Merge tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi · fe5cdef2
      Linus Torvalds 提交于
      Pull IPMI fixes from Corey Minyard:
       "Fixes for some bugs cause by recent changes. One crash if you feed bad
        data to the module parameters, one BUG that sometimes occurs when a
        user closes the connection, and one bug that cause the driver to not
        work if the configuration information only comes in from SMBIOS"
      
      * tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi:
        ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
        ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
        ipmi: Fix failure on SMBIOS specified devices
      fe5cdef2
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 2a3a028f
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Handle init flow failures properly in iwlwifi driver, from Shahar S
          Matityahu.
      
       2) mac80211 TXQs need to be unscheduled on powersave start, from Felix
          Fietkau.
      
       3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau.
      
       4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed.
      
       5) Avoid checksum complete with XDP in mlx5, also from Saeed.
      
       6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon.
      
       7) Partial sent TLS record leak fix from Jakub Kicinski.
      
       8) Reject zero size iova range in vhost, from Jason Wang.
      
       9) Allow pending work to complete before clcsock release from Karsten
          Graul.
      
      10) Fix XDP handling max MTU in thunderx, from Matteo Croce.
      
      11) A lot of protocols look at the sa_family field of a sockaddr before
          validating it's length is large enough, from Tetsuo Handa.
      
      12) Don't write to free'd pointer in qede ptp error path, from Colin Ian
          King.
      
      13) Have to recompile IP options in ipv4_link_failure because it can be
          invoked from ARP, from Stephen Suryaputra.
      
      14) Doorbell handling fixes in qed from Denis Bolotin.
      
      15) Revert net-sysfs kobject register leak fix, it causes new problems.
          From Wang Hai.
      
      16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva.
      
      17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay
          Aleksandrov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits)
        socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW
        tcp: tcp_grow_window() needs to respect tcp_space()
        ocelot: Clean up stats update deferred work
        ocelot: Don't sleep in atomic context (irqs_disabled())
        net: bridge: fix netlink export of vlan_stats_per_port option
        qed: fix spelling mistake "faspath" -> "fastpath"
        tipc: set sysctl_tipc_rmem and named_timeout right range
        tipc: fix link established but not in session
        net: Fix missing meta data in skb with vlan packet
        net: atm: Fix potential Spectre v1 vulnerabilities
        net/core: work around section mismatch warning for ptp_classifier
        net: bridge: fix per-port af_packet sockets
        bnx2x: fix spelling mistake "dicline" -> "decline"
        route: Avoid crash from dereferencing NULL rt->from
        MAINTAINERS: normalize Woojung Huh's email address
        bonding: fix event handling for stacked bonds
        Revert "net-sysfs: Fix memory leak in netdev_register_kobject"
        rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check
        qed: Fix the DORQ's attentions handling
        qed: Fix missing DORQ attentions
        ...
      2a3a028f
  3. 17 4月, 2019 16 次提交
  4. 16 4月, 2019 15 次提交
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b5de3c50
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
       "5.1 keeps its reputation as a big bugfix release for KVM x86.
      
         - Fix for a memory leak introduced during the merge window
      
         - Fixes for nested VMX with ept=0
      
         - Fixes for AMD (APIC virtualization, NMI injection)
      
         - Fixes for Hyper-V under KVM and KVM under Hyper-V
      
         - Fixes for 32-bit SMM and tests for SMM virtualization
      
         - More array_index_nospec peppering"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
        KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
        KVM: fix spectrev1 gadgets
        KVM: x86: fix warning Using plain integer as NULL pointer
        selftests: kvm: add a selftest for SMM
        selftests: kvm: fix for compilers that do not support -no-pie
        selftests: kvm/evmcs_test: complete I/O before migrating guest state
        KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
        KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
        KVM: x86: clear SMM flags before loading state while leaving SMM
        KVM: x86: Open code kvm_set_hflags
        KVM: x86: Load SMRAM in a single shot when leaving SMM
        KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
        KVM: x86: Raise #GP when guest vCPU do not support PMU
        x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
        KVM: x86: svm: make sure NMI is injected after nmi_singlestep
        svm/avic: Fix invalidate logical APIC id entry
        Revert "svm: Fix AVIC incomplete IPI emulation"
        kvm: mmu: Fix overflow on kvm mmu page limit calculation
        KVM: nVMX: always use early vmcs check when EPT is disabled
        KVM: nVMX: allow tests to use bad virtual-APIC page address
        ...
      b5de3c50
    • V
      KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing · 7a223e06
      Vitaly Kuznetsov 提交于
      In __apic_accept_irq() interface trig_mode is int and actually on some code
      paths it is set above u8:
      
      kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
      is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
      (1 << 15) & e->msi.data
      
      kvm_apic_local_deliver sets it to reg & (1 << 15).
      
      Fix the immediate issue by making 'tm' into u16. We may also want to adjust
      __apic_accept_irq() interface and use proper sizes for vector, level,
      trig_mode but this is not urgent.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a223e06
    • P
      KVM: fix spectrev1 gadgets · 1d487e9b
      Paolo Bonzini 提交于
      These were found with smatch, and then generalized when applicable.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d487e9b
    • H
      KVM: x86: fix warning Using plain integer as NULL pointer · be43c440
      Hariprasad Kelam 提交于
      Changed passing argument as "0 to NULL" which resolves below sparse warning
      
      arch/x86/kvm/x86.c:3096:61: warning: Using plain integer as NULL pointer
      Signed-off-by: NHariprasad Kelam <hariprasad.kelam@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      be43c440
    • V
      selftests: kvm: add a selftest for SMM · 79904c9d
      Vitaly Kuznetsov 提交于
      Add a simple test for SMM, based on VMX.  The test implements its own
      sync between the guest and the host as using our ucall library seems to
      be too cumbersome: SMI handler is happening in real-address mode.
      
      This patch also fixes KVM_SET_NESTED_STATE to happen after
      KVM_SET_VCPU_EVENTS, in fact it places it last.  This is because
      KVM needs to know whether the processor is in SMM or not.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      79904c9d
    • P
      selftests: kvm: fix for compilers that do not support -no-pie · c2390f16
      Paolo Bonzini 提交于
      -no-pie was added to GCC at the same time as their configuration option
      --enable-default-pie.  Compilers that were built before do not have
      -no-pie, but they also do not need it.  Detect the option at build
      time.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2390f16
    • P
      selftests: kvm/evmcs_test: complete I/O before migrating guest state · c68c21ca
      Paolo Bonzini 提交于
      Starting state migration after an IO exit without first completing IO
      may result in test failures.  We already have two tests that need this
      (this patch in fact fixes evmcs_test, similar to what was fixed for
      state_test in commit 0f73bbc8, "KVM: selftests: complete IO before
      migrating guest state", 2019-03-13) and a third is coming.  So, move the
      code to vcpu_save_state, and while at it do not access register state
      until after I/O is complete.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c68c21ca
    • S
      KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels · b68f3cc7
      Sean Christopherson 提交于
      Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
      trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
      rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
      thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
      
      KVM allows userspace to report long mode support via CPUID, even though
      the guest is all but guaranteed to crash if it actually tries to enable
      long mode.  But, a pure 32-bit guest that is ignorant of long mode will
      happily plod along.
      
      SMM complicates things as 64-bit CPUs use a different SMRAM save state
      area.  KVM handles this correctly for 64-bit kernels, e.g. uses the
      legacy save state map if userspace has hid long mode from the guest,
      but doesn't fare well when userspace reports long mode support on a
      32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
      
      Since the alternative is to crash the guest, e.g. by not loading state
      or explicitly requesting shutdown, unconditionally use the legacy SMRAM
      save state map for 32-bit KVM.  If a guest has managed to get far enough
      to handle SMIs when running under a weird/buggy userspace hypervisor,
      then don't deliberately crash the guest since there are no downsides
      (from KVM's perspective) to allow it to continue running.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b68f3cc7
    • S
      KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU · 8f4dc2e7
      Sean Christopherson 提交于
      Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
      state area, i.e. don't save/restore EFER across SMM transitions.  KVM
      somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
      guest doesn't support long mode.  But during RSM, KVM unconditionally
      clears EFER so that it can get back to pure 32-bit mode in order to
      start loading CRs with their actual non-SMM values.
      
      Clear EFER only when it will be written when loading the non-SMM state
      so as to preserve bits that can theoretically be set on 32-bit vCPUs,
      e.g. KVM always emulates EFER_SCE.
      
      And because CR4.PAE is cleared only to play nice with EFER, wrap that
      code in the long mode check as well.  Note, this may result in a
      compiler warning about cr4 being consumed uninitialized.  Re-read CR4
      even though it's technically unnecessary, as doing so allows for more
      readable code and RSM emulation is not a performance critical path.
      
      Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f4dc2e7
    • S
      KVM: x86: clear SMM flags before loading state while leaving SMM · 9ec19493
      Sean Christopherson 提交于
      RSM emulation is currently broken on VMX when the interrupted guest has
      CR4.VMXE=1.  Stop dancing around the issue of HF_SMM_MASK being set when
      loading SMSTATE into architectural state, e.g. by toggling it for
      problematic flows, and simply clear HF_SMM_MASK prior to loading
      architectural state (from SMRAM save state area).
      Reported-by: NJon Doron <arilou@gmail.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Fixes: 5bea5123 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9ec19493
    • S
      KVM: x86: Open code kvm_set_hflags · c5833c7a
      Sean Christopherson 提交于
      Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM
      save state map, i.e. kvm_smm_changed() needs to be called after state
      has been loaded and so cannot be done automatically when setting
      hflags from RSM.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5833c7a
    • S
      KVM: x86: Load SMRAM in a single shot when leaving SMM · ed19321f
      Sean Christopherson 提交于
      RSM emulation is currently broken on VMX when the interrupted guest has
      CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
      when loading SMSTATE into architectural state, ideally RSM emulation
      itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
      architectural state.
      
      Ostensibly, the only motivation for having HF_SMM_MASK set throughout
      the loading of state from the SMRAM save state area is so that the
      memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
      all of the SMRAM save state area from guest memory at the beginning of
      RSM emulation, and load state from the buffer instead of reading guest
      memory one-by-one.
      
      This paves the way for clearing HF_SMM_MASK prior to loading state,
      and also aligns RSM with the enter_smm() behavior, which fills a
      buffer and writes SMRAM save state in a single go.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed19321f
    • L
      KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU · e51bfdb6
      Liran Alon 提交于
      Issue was discovered when running kvm-unit-tests on KVM running as L1 on
      top of Hyper-V.
      
      When vmx_instruction_intercept unit-test attempts to run RDPMC to test
      RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
      handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
      Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
      
      The reason vmx_instruction_intercept unit-test attempts to run RDPMC
      even though Hyper-V doesn't support PMU is because L1 expose to L2
      support for RDPMC-exiting. Which is reasonable to assume that is
      supported only in case CPU supports PMU to being with.
      
      Above issue can easily be simulated by modifying
      vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
      "-cpu host,+vmx,-pmu" and run unit-test.
      
      To handle issue, change KVM to expose RDPMC-exiting only when guest
      supports PMU.
      Reported-by: NSaar Amar <saaramar@microsoft.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e51bfdb6
    • L
      KVM: x86: Raise #GP when guest vCPU do not support PMU · 672ff6cf
      Liran Alon 提交于
      Before this change, reading a VMware pseduo PMC will succeed even when
      PMU is not supported by guest. This can easily be seen by running
      kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      672ff6cf
    • W
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 1811d979
      WANG Chao 提交于
      guest xcr0 could leak into host when MCE happens in guest mode. Because
      do_machine_check() could schedule out at a few places.
      
      For example:
      
      kvm_load_guest_xcr0
      ...
      kvm_x86_ops->run(vcpu) {
        vmx_vcpu_run
          vmx_complete_atomic_exit
            kvm_machine_check
              do_machine_check
                do_memory_failure
                  memory_failure
                    lock_page
      
      In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
      out, host cpu has guest xcr0 loaded (0xff).
      
      In __switch_to {
           switch_fpu_finish
             copy_kernel_to_fpregs
               XRSTORS
      
      If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
      generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
      and tries to reinitialize fpu by restoring init fpu state. Same story as
      last #GP, except we get DOUBLE FAULT this time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1811d979