1. 24 3月, 2019 1 次提交
  2. 13 2月, 2019 3 次提交
  3. 17 1月, 2019 1 次提交
    • C
      KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · 4f14f446
      Christoffer Dall 提交于
      commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.
      
      We recently addressed a VMID generation race by introducing a read/write
      lock around accesses and updates to the vmid generation values.
      
      However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
      does so without taking the read lock.
      
      As far as I can tell, this can lead to the same kind of race:
      
        VM 0, VCPU 0			VM 0, VCPU 1
        ------------			------------
        update_vttbr (vmid 254)
        				update_vttbr (vmid 1) // roll over
      				read_lock(kvm_vmid_lock);
      				force_vm_exit()
        local_irq_disable
        need_new_vmid_gen == false //because vmid gen matches
      
        enter_guest (vmid 254)
        				kvm_arch.vttbr = <PGD>:<VMID 1>
      				read_unlock(kvm_vmid_lock);
      
        				enter_guest (vmid 1)
      
      Which results in running two VCPUs in the same VM with different VMIDs
      and (even worse) other VCPUs from other VMs could now allocate clashing
      VMID 254 from the new generation as long as VCPU 0 is not exiting.
      
      Attempt to solve this by making sure vttbr is updated before another CPU
      can observe the updated VMID generation.
      
      Cc: stable@vger.kernel.org
      Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race"
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f14f446
  4. 10 1月, 2019 5 次提交
  5. 14 11月, 2018 2 次提交
    • M
      KVM: arm64: Fix caching of host MDCR_EL2 value · 59571785
      Mark Rutland 提交于
      commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream.
      
      At boot time, KVM stashes the host MDCR_EL2 value, but only does this
      when the kernel is not running in hyp mode (i.e. is non-VHE). In these
      cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
      lead to CONSTRAINED UNPREDICTABLE behaviour.
      
      Since we use this value to derive the MDCR_EL2 value when switching
      to/from a guest, after a guest have been run, the performance counters
      do not behave as expected. This has been observed to result in accesses
      via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
      counters, resulting in events not being counted. In these cases, only
      the fixed-purpose cycle counter appears to work as expected.
      
      Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.
      
      Cc: Christopher Dall <christoffer.dall@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP")
      Tested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59571785
    • P
      KVM: arm/arm64: Ensure only THP is candidate for adjustment · 3e286d39
      Punit Agrawal 提交于
      commit fd2ef358 upstream.
      
      PageTransCompoundMap() returns true for hugetlbfs and THP
      hugepages. This behaviour incorrectly leads to stage 2 faults for
      unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be
      treated as THP faults.
      
      Tighten the check to filter out hugetlbfs pages. This also leads to
      consistently mapping all unsupported hugepage sizes as PTE level
      entries at stage 2.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e286d39
  6. 07 9月, 2018 2 次提交
  7. 23 8月, 2018 1 次提交
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  8. 13 8月, 2018 2 次提交
  9. 12 8月, 2018 3 次提交
  10. 06 8月, 2018 3 次提交
  11. 31 7月, 2018 2 次提交
    • C
      KVM: arm/arm64: Fix lost IRQs from emulated physcial timer when blocked · 245715cb
      Christoffer Dall 提交于
      When the VCPU is blocked (for example from WFI) we don't inject the
      physical timer interrupt if it should fire while the CPU is blocked, but
      instead we just wake up the VCPU and expect kvm_timer_vcpu_load to take
      care of injecting the interrupt.
      
      Unfortunately, kvm_timer_vcpu_load() doesn't actually do that, it only
      has support to schedule a soft timer if the emulated phys timer is
      expected to fire in the future.
      
      Follow the same pattern as kvm_timer_update_state() and update the irq
      state after potentially scheduling a soft timer.
      Reported-by: NAndre Przywara <andre.przywara@arm.com>
      Cc: Stable <stable@vger.kernel.org> # 4.15+
      Fixes: bbdd52cf ("KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit")
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      245715cb
    • C
      KVM: arm/arm64: Fix potential loss of ptimer interrupts · 7afc4ddb
      Christoffer Dall 提交于
      kvm_timer_update_state() is called when changing the phys timer
      configuration registers, either via vcpu reset, as a result of a trap
      from the guest, or when userspace programs the registers.
      
      phys_timer_emulate() is in turn called by kvm_timer_update_state() to
      either cancel an existing software timer, or program a new software
      timer, to emulate the behavior of a real phys timer, based on the change
      in configuration registers.
      
      Unfortunately, the interaction between these two functions left a small
      race; if the conceptual emulated phys timer should actually fire, but
      the soft timer hasn't executed its callback yet, we cancel the timer in
      phys_timer_emulate without injecting an irq.  This only happens if the
      check in kvm_timer_update_state is called before the timer should fire,
      which is relatively unlikely, but possible.
      
      The solution is to update the state of the phys timer after calling
      phys_timer_emulate, which will pick up the pending timer state and
      update the interrupt value.
      
      Note that this leaves the opportunity of raising the interrupt twice,
      once in the just-programmed soft timer, and once in
      kvm_timer_update_state.  Since this always happens synchronously with
      the VCPU execution, there is no harm in this, and the guest ever only
      sees a single timer interrupt.
      
      Cc: Stable <stable@vger.kernel.org> # 4.15+
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7afc4ddb
  12. 24 7月, 2018 1 次提交
    • M
      KVM: arm/arm64: vgic: Fix possible spectre-v1 write in vgic_mmio_write_apr() · 6b8b9a48
      Mark Rutland 提交于
      It's possible for userspace to control n. Sanitize n when using it as an
      array index, to inhibit the potential spectre-v1 write gadget.
      
      Note that while it appears that n must be bound to the interval [0,3]
      due to the way it is extracted from addr, we cannot guarantee that
      compiler transformations (and/or future refactoring) will ensure this is
      the case, and given this is a slow path it's better to always perform
      the masking.
      
      Found by smatch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      6b8b9a48
  13. 21 7月, 2018 14 次提交