1. 17 12月, 2022 11 次提交
  2. 30 1月, 2022 3 次提交
    • S
      KVM: Move the memslot update in-progress flag to bit 63 · e272dde6
      Sean Christopherson 提交于
      mainline inclusion
      from mainline-v5.1-rc1
      commit 164bf7e5
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4MKP4
      CVE: NA
      
      --------------------------------
      
      ...now that KVM won't explode by moving it out of bit 0.  Using bit 63
      eliminates the need to jump over bit 0, e.g. when calculating a new
      memslots generation or when propagating the memslots generation to an
      MMIO spte.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> #openEuler_contributor
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e272dde6
    • S
      KVM: Remove the hack to trigger memslot generation wraparound · 53e9bbf6
      Sean Christopherson 提交于
      mainline inclusion
      from mainline-v5.1-rc1
      commit 0e32958e
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4MKP4
      CVE: NA
      
      --------------------------------
      
      x86 captures a subset of the memslot generation (19 bits) in its MMIO
      sptes so that it can expedite emulated MMIO handling by checking only
      the releveant spte, i.e. doesn't need to do a full page fault walk.
      
      Because the MMIO sptes capture only 19 bits (due to limited space in
      the sptes), there is a non-zero probability that the MMIO generation
      could wrap, e.g. after 500k memslot updates.  Since normal usage is
      extremely unlikely to result in 500k memslot updates, a hack was added
      by commit 69c9ea93 ("KVM: MMU: init kvm generation close to mmio
      wrap-around value") to offset the MMIO generation in order to trigger
      a wraparound, e.g. after 150 memslot updates.
      
      When separate memslot generation sequences were assigned to each
      address space, commit 00f034a1 ("KVM: do not bias the generation
      number in kvm_current_mmio_generation") moved the offset logic into the
      initialization of the memslot generation itself so that the per-address
      space bit(s) were not dropped/corrupted by the MMIO shenanigans.
      
      Remove the offset hack for three reasons:
      
        - While it does exercise x86's kvm_mmu_invalidate_mmio_sptes(), simply
          wrapping the generation doesn't actually test the interesting case
          of having stale MMIO sptes with the new generation number, e.g. old
          sptes with a generation number of 0.
      
        - Triggering kvm_mmu_invalidate_mmio_sptes() prematurely makes its
          performance rather important since the probability of invalidating
          MMIO sptes jumps from "effectively never" to "fairly likely".  This
          limits what can be done in future patches, e.g. to simplify the
          invalidation code, as doing so without proper caution could lead to
          a noticeable performance regression.
      
        - Forcing the memslots generation, which is a 64-bit number, to wrap
          prevents KVM from assuming the memslots generation will never wrap.
          This in turn prevents KVM from using an arbitrary bit for the
          "update in-progress" flag, e.g. using bit 63 would immediately
          collide with using a large value as the starting generation number.
          The "update in-progress" flag is effectively forced into bit 0 so
          that it's (subtly) taken into account when incrementing the
          generation.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> #openEuler_contributor
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      53e9bbf6
    • S
      KVM: Explicitly define the "memslot update in-progress" bit · 6cd5d909
      Sean Christopherson 提交于
      mainline inclusion
      from mainline-v5.1-rc1
      commit 361209e0
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4MKP4
      CVE: NA
      
      --------------------------------
      
      KVM uses bit 0 of the memslots generation as an "update in-progress"
      flag, which is used by x86 to prevent caching MMIO access while the
      memslots are changing.  Although the intended behavior is flag-like,
      e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid
      caching data from in-flux memslots, the implementation oftentimes treats
      the bit as part of the generation number itself, e.g. incrementing the
      generation increments twice, once to set the flag and once to clear it.
      
      Prior to commit 4bd518f1 ("KVM: use separate generations for
      each address space"), incorporating the "update in-progress" bit into
      the generation number largely made sense, e.g. "real" generations are
      even, "bogus" generations are odd, most code doesn't need to be aware of
      the bit, etc...
      
      Now that unique memslots generation numbers are assigned to each address
      space, stealthing the in-progress status into the generation number
      results in a wide variety of subtle code, e.g. kvm_create_vm() jumps
      over bit 0 when initializing the memslots generation without any hint as
      to why.
      
      Explicitly define the flag and convert as much code as possible (which
      isn't much) to actually treat it like a flag.  This paves the way for
      eventually using a different bit for "update in-progress" so that it can
      be a flag in truth instead of a awkward extension to the generation
      number.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> #openEuler_contributor
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6cd5d909
  3. 22 9月, 2021 1 次提交
  4. 01 6月, 2021 1 次提交
  5. 22 4月, 2021 3 次提交
  6. 14 4月, 2021 1 次提交
  7. 11 3月, 2021 1 次提交
  8. 29 10月, 2020 2 次提交
  9. 13 1月, 2020 1 次提交
    • M
      KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence · 66f2b4b5
      Marc Zyngier 提交于
      mainline inclusion
      from mainline-v5.4-rc1
      commit 07ab0f8d
      category: feature
      feature: guest VLPI during the halt poll window
               can be immediately recognised
      
      -------------------------------------------------
      
      When a vpcu is about to block by calling kvm_vcpu_block, we call
      back into the arch code to allow any form of synchronization that
      may be required at this point (SVN stops the AVIC, ARM synchronises
      the VMCR and enables GICv4 doorbells). But this synchronization
      comes in quite late, as we've potentially waited for halt_poll_ns
      to expire.
      
      Instead, let's move kvm_arch_vcpu_blocking() to the beginning of
      kvm_vcpu_block(), which on ARM has several benefits:
      
      - VMCR gets synchronised early, meaning that any interrupt delivered
        during the polling window will be evaluated with the correct guest
        PMR
      - GICv4 doorbells are enabled, which means that any guest interrupt
        directly injected during that window will be immediately recognised
      
      Tang Nianyao ran some tests on a GICv4 machine to evaluate such
      change, and reported up to a 10% improvement for netperf:
      
      <quote>
      	netperf result:
      	D06 as server, intel 8180 server as client
      	with change:
      	package 512 bytes - 5500 Mbits/s
      	package 64 bytes - 760 Mbits/s
      	without change:
      	package 512 bytes - 5000 Mbits/s
      	package 64 bytes - 710 Mbits/s
      </quote>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NHailiang Zhang <zhang.zhanghailiang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      66f2b4b5
  10. 27 12月, 2019 14 次提交
  11. 23 8月, 2018 1 次提交
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  12. 06 8月, 2018 1 次提交