1. 12 12月, 2020 1 次提交
  2. 06 12月, 2020 3 次提交
  3. 04 12月, 2020 1 次提交
  4. 03 12月, 2020 2 次提交
  5. 02 12月, 2020 1 次提交
  6. 01 12月, 2020 1 次提交
  7. 28 11月, 2020 2 次提交
    • G
      x86/mce: Do not overwrite no_way_out if mce_end() fails · 25bc65d8
      Gabriele Paoloni 提交于
      Currently, if mce_end() fails, no_way_out - the variable denoting
      whether the machine can recover from this MCE - is determined by whether
      the worst severity that was found across the MCA banks associated with
      the current CPU, is of panic severity.
      
      However, at this point no_way_out could have been already set by
      mca_start() after looking at all severities of all CPUs that entered the
      MCE handler. If mce_end() fails, check first if no_way_out is already
      set and, if so, stick to it, otherwise use the local worst value.
      
       [ bp: Massage. ]
      Signed-off-by: NGabriele Paoloni <gabriele.paoloni@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com
      25bc65d8
    • V
      kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT · 9a2a0d3c
      Vitaly Kuznetsov 提交于
      Commit 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
      the following WARNING on an Intel Ice Lake CPU:
      
       get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
       ------ spte 0xb80a0 level 5.
       ------ spte 0xfcd210107 level 4.
       ------ spte 0x1004c40107 level 3.
       ------ spte 0x1004c41107 level 2.
       ------ spte 0x1db00000000b83b6 level 1.
       WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
      ...
       Call Trace:
        ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
        vcpu_enter_guest+0xaa1/0x16a0 [kvm]
        ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
        ? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]
      
      The guest triggering this crashes. Note, this happens with the traditional
      MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
      there was a subtle change in the above mentioned commit. Previously,
      walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
      which is returned by shadow_walk_init() and this equals to
      'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
      'int root = vcpu->arch.mmu->root_level'.
      
      The difference between 'root_level' and 'shadow_root_level' on CPUs
      supporting 5-level page tables is that in some case we don't want to
      use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
      kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
      the corresponding SPTE will fail '__is_rsvd_bits_set()' check.
      
      Revert to using 'shadow_root_level'.
      
      Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a2a0d3c
  8. 27 11月, 2020 2 次提交
    • P
      KVM: x86: Fix split-irqchip vs interrupt injection window request · 71cc849b
      Paolo Bonzini 提交于
      kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
      a hodge-podge of conditions, hacked together to get something that
      more or less works.  But what is actually needed is much simpler;
      in both cases the fundamental question is, do we have a place to stash
      an interrupt if userspace does KVM_INTERRUPT?
      
      In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
      Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
      unnecessarily restrictive.
      
      In split irqchip mode it's a bit more complicated, we need to check
      kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
      cycle and thus requires ExtINTs not to be masked) as well as
      !pending_userspace_extint(vcpu).  However, there is no need to
      check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
      pending ExtINT state separate from event injection state, and checking
      kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
      priority than APIC interrupts.  In fact the latter fixes a bug:
      when userspace requests an IRQ window vmexit, an interrupt in the
      local APIC can cause kvm_cpu_has_interrupt() to be true and thus
      kvm_vcpu_ready_for_interrupt_injection() to return false.  When this
      happens, vcpu_run does not exit to userspace but the interrupt window
      vmexits keep occurring.  The VM loops without any hope of making progress.
      
      Once we try to fix these with something like
      
           return kvm_arch_interrupt_allowed(vcpu) &&
      -        !kvm_cpu_has_interrupt(vcpu) &&
      -        !kvm_event_needs_reinjection(vcpu) &&
      -        kvm_cpu_accept_dm_intr(vcpu);
      +        (!lapic_in_kernel(vcpu)
      +         ? !vcpu->arch.interrupt.injected
      +         : (kvm_apic_accept_pic_intr(vcpu)
      +            && !pending_userspace_extint(v)));
      
      we realize two things.  First, thanks to the previous patch the complex
      conditional can reuse !kvm_cpu_has_extint(vcpu).  Second, the interrupt
      window request in vcpu_enter_guest()
      
              bool req_int_win =
                      dm_request_for_irq_injection(vcpu) &&
                      kvm_cpu_accept_dm_intr(vcpu);
      
      should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
      it is unnecessary to ask the processor for an interrupt window
      if we would not be able to return to userspace.  Therefore,
      kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
      ANDed with the existing check for masked ExtINT.  It all makes sense:
      
      - we can accept an interrupt from userspace if there is a place
        to stash it (and, for irqchip split, ExtINTs are not masked).
        Interrupts from userspace _can_ be accepted even if right now
        EFLAGS.IF=0.
      
      - in order to tell userspace we will inject its interrupt ("IRQ
        window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
        KVM and the vCPU need to be ready to accept the interrupt.
      
      ... and this is what the patch implements.
      Reported-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Analyzed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      71cc849b
    • P
      KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint · 72c3bcdc
      Paolo Bonzini 提交于
      Centralize handling of interrupts from the userspace APIC
      in kvm_cpu_has_extint and kvm_cpu_get_extint, since
      userspace APIC interrupts are handled more or less the
      same as ExtINTs are with split irqchip.  This removes
      duplicated code from kvm_cpu_has_injectable_intr and
      kvm_cpu_has_interrupt, and makes the code more similar
      between kvm_cpu_has_{extint,interrupt} on one side
      and kvm_cpu_get_{extint,interrupt} on the other.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NFilippo Sironi <sironi@amazon.de>
      Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Tested-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      72c3bcdc
  9. 26 11月, 2020 1 次提交
    • A
      x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb · 33fc379d
      Anand K Mistry 提交于
      When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
      line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
      available).
      
      However, since
      
        21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
      
      the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
      instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
      Because the issuing of IBPB relies on the switch_mm_*_ibpb static
      branches, the mitigations behave as expected.
      
      Since
      
        1978b3a5 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
      
      this discrepency caused the misreporting of IB speculation via prctl().
      
      On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
      prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
      PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
      always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
      speculation mode, even though the flag is ignored.
      
      Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
      also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
      available.
      
       [ bp: Massage commit message. ]
      
      Fixes: 21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
      Fixes: 1978b3a5 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
      Signed-off-by: NAnand K Mistry <amistry@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid
      33fc379d
  10. 25 11月, 2020 1 次提交
  11. 24 11月, 2020 3 次提交
    • P
      sched/idle: Fix arch_cpu_idle() vs tracing · 58c644ba
      Peter Zijlstra 提交于
      We call arch_cpu_idle() with RCU disabled, but then use
      local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
      
      Switch all arch_cpu_idle() implementations to use
      raw_local_irq_{en,dis}able() and carefully manage the
      lockdep,rcu,tracing state like we do in entry.
      
      (XXX: we really should change arch_cpu_idle() to not return with
      interrupts enabled)
      Reported-by: NSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
      58c644ba
    • X
      x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak · 75899924
      Xiaochen Shen 提交于
      On resource group creation via a mkdir an extra kernfs_node reference is
      obtained by kernfs_get() to ensure that the rdtgroup structure remains
      accessible for the rdtgroup_kn_unlock() calls where it is removed on
      deletion. Currently the extra kernfs_node reference count is only
      dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
      structure is removed in a few other locations that lack the matching
      reference drop.
      
      In call paths of rmdir and umount, when a control group is removed,
      kernfs_remove() is called to remove the whole kernfs nodes tree of the
      control group (including the kernfs nodes trees of all child monitoring
      groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
      structures of all child monitoring groups under the control group are
      freed by kfree() in free_all_child_rdtgrp().
      
      Before calling kfree() to free the rdtgroup structures, the kernfs node
      of the control group itself as well as the kernfs nodes of all child
      monitoring groups still take the extra references which will never be
      dropped to 0 and the kernfs nodes will never be freed. It leads to
      reference count leak and kernfs_node_cache memory leak.
      
      For example, reference count leak is observed in these two cases:
        (1) mount -t resctrl resctrl /sys/fs/resctrl
            mkdir /sys/fs/resctrl/c1
            mkdir /sys/fs/resctrl/c1/mon_groups/m1
            umount /sys/fs/resctrl
      
        (2) mkdir /sys/fs/resctrl/c1
            mkdir /sys/fs/resctrl/c1/mon_groups/m1
            rmdir /sys/fs/resctrl/c1
      
      The same reference count leak issue also exists in the error exit paths
      of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
      
      Fix this issue by following changes to make sure the extra kernfs_node
      reference on rdtgroup is dropped before freeing the rdtgroup structure.
        (1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
        kernfs_put() and kfree().
      
        (2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
        structure is about to be freed by kfree().
      
        (3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
        exit paths of mkdir where an extra reference is taken by kernfs_get().
      
      Fixes: f3cbeaca ("x86/intel_rdt/cqm: Add rmdir support")
      Fixes: e02737d5 ("x86/intel_rdt: Add tasks files")
      Fixes: 60cf5e10 ("x86/intel_rdt: Add mkdir to resctrl file system")
      Reported-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NReinette Chatre <reinette.chatre@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com
      75899924
    • X
      x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak · fd8d9db3
      Xiaochen Shen 提交于
      Willem reported growing of kernfs_node_cache entries in slabtop when
      repeatedly creating and removing resctrl subdirectories as well as when
      repeatedly mounting and unmounting the resctrl filesystem.
      
      On resource group (control as well as monitoring) creation via a mkdir
      an extra kernfs_node reference is obtained to ensure that the rdtgroup
      structure remains accessible for the rdtgroup_kn_unlock() calls where it
      is removed on deletion. The kernfs_node reference count is dropped by
      kernfs_put() in rdtgroup_kn_unlock().
      
      With the above explaining the need for one kernfs_get()/kernfs_put()
      pair in resctrl there are more places where a kernfs_node reference is
      obtained without a corresponding release. The excessive amount of
      reference count on kernfs nodes will never be dropped to 0 and the
      kernfs nodes will never be freed in the call paths of rmdir and umount.
      It leads to reference count leak and kernfs_node_cache memory leak.
      
      Remove the superfluous kernfs_get() calls and expand the existing
      comments surrounding the remaining kernfs_get()/kernfs_put() pair that
      remains in use.
      
      Superfluous kernfs_get() calls are removed from two areas:
      
        (1) In call paths of mount and mkdir, when kernfs nodes for "info",
        "mon_groups" and "mon_data" directories and sub-directories are
        created, the reference count of newly created kernfs node is set to 1.
        But after kernfs_create_dir() returns, superfluous kernfs_get() are
        called to take an additional reference.
      
        (2) kernfs_get() calls in rmdir call paths.
      
      Fixes: 17eafd07 ("x86/intel_rdt: Split resource group removal in two")
      Fixes: 4af4a88e ("x86/intel_rdt/cqm: Add mount,umount support")
      Fixes: f3cbeaca ("x86/intel_rdt/cqm: Add rmdir support")
      Fixes: d89b7379 ("x86/intel_rdt/cqm: Add mon_data")
      Fixes: c7d9aac6 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
      Fixes: 5dc1d5c6 ("x86/intel_rdt: Simplify info and base file lists")
      Fixes: 60cf5e10 ("x86/intel_rdt: Add mkdir to resctrl file system")
      Fixes: 4e978d06 ("x86/intel_rdt: Add "info" files to resctrl file system")
      Reported-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NReinette Chatre <reinette.chatre@intel.com>
      Tested-by: NWillem de Bruijn <willemb@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com
      fd8d9db3
  12. 23 11月, 2020 1 次提交
  13. 18 11月, 2020 2 次提交
  14. 17 11月, 2020 4 次提交
  15. 15 11月, 2020 1 次提交
    • P
      kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9
      Paolo Bonzini 提交于
      In some cases where shadow paging is in use, the root page will
      be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
      not have an associated struct kvm_mmu_page, because it is allocated
      with alloc_page instead of kvm_mmu_alloc_page.
      
      Just return false quickly from is_tdp_mmu_root if the TDP MMU is
      not in use, which also includes the case where shadow paging is
      enabled.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c887c9b9
  16. 13 11月, 2020 4 次提交
  17. 11 11月, 2020 3 次提交
  18. 10 11月, 2020 5 次提交
  19. 09 11月, 2020 1 次提交
    • B
      x86/xen: don't unbind uninitialized lock_kicker_irq · 65cae188
      Brian Masney 提交于
      When booting a hyperthreaded system with the kernel parameter
      'mitigations=auto,nosmt', the following warning occurs:
      
          WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60
          ...
          Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
          ...
          Call Trace:
           xen_uninit_lock_cpu+0x28/0x62
           xen_hvm_cpu_die+0x21/0x30
           takedown_cpu+0x9c/0xe0
           ? trace_suspend_resume+0x60/0x60
           cpuhp_invoke_callback+0x9a/0x530
           _cpu_up+0x11a/0x130
           cpu_up+0x7e/0xc0
           bringup_nonboot_cpus+0x48/0x50
           smp_init+0x26/0x79
           kernel_init_freeable+0xea/0x229
           ? rest_init+0xaa/0xaa
           kernel_init+0xa/0x106
           ret_from_fork+0x35/0x40
      
      The secondary CPUs are not activated with the nosmt mitigations and only
      the primary thread on each CPU core is used. In this situation,
      xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is
      not called, so the lock_kicker_irq is not initialized for the secondary
      CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the
      irq is not set to avoid the warning from above for each secondary CPU.
      Signed-off-by: NBrian Masney <bmasney@redhat.com>
      Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.comReviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      65cae188
  20. 08 11月, 2020 1 次提交