1. 30 8月, 2018 1 次提交
  2. 23 8月, 2018 1 次提交
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  3. 22 8月, 2018 5 次提交
    • P
      KVM: VMX: fixes for vmentry_l1d_flush module parameter · 0027ff2a
      Paolo Bonzini 提交于
      Two bug fixes:
      
      1) missing entries in the l1d_param array; this can cause a host crash
      if an access attempts to reach the missing entry. Future-proof the get
      function against any overflows as well.  However, the two entries
      VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
      not be accepted by the parse function, so disable them there.
      
      2) invalid values must be rejected even if the CPU does not have the
      bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
      
      ... and a small refactoring, since the .cmd field is redundant with
      the index in the array.
      Reported-by: NBandan Das <bsd@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: a7b9020bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0027ff2a
    • T
      KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled · 024d83ca
      Thomas Gleixner 提交于
      Mikhail reported the following lockdep splat:
      
      WARNING: possible irq lock inversion dependency detected
      CPU 0/KVM/10284 just changed the state of lock:
        000000000d538a88 (&st->lock){+...}, at:
        speculative_store_bypass_update+0x10b/0x170
      
      but this lock was taken by another, HARDIRQ-safe lock
      in the past:
      
      (&(&sighand->siglock)->rlock){-.-.}
      
         and interrupts could create inverse lock ordering between them.
      
      Possible interrupt unsafe locking scenario:
      
          CPU0                    CPU1
          ----                    ----
         lock(&st->lock);
                                 local_irq_disable();
                                 lock(&(&sighand->siglock)->rlock);
                                 lock(&st->lock);
          <Interrupt>
           lock(&(&sighand->siglock)->rlock);
           *** DEADLOCK ***
      
      The code path which connects those locks is:
      
         speculative_store_bypass_update()
         ssb_prctl_set()
         do_seccomp()
         do_syscall_64()
      
      In svm_vcpu_run() speculative_store_bypass_update() is called with
      interupts enabled via x86_virt_spec_ctrl_set_guest/host().
      
      This is actually a false positive, because GIF=0 so interrupts are
      disabled even if IF=1; however, we can easily move the invocations of
      x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to
      cure it, and it's a good idea to keep the GIF=0/IF=1 area as small
      and self-contained as possible.
      
      Fixes: 1f50ddb4 ("x86/speculation: Handle HT correctly on AMD")
      Reported-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: x86@kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      024d83ca
    • S
      KVM: vmx: Inject #UD for SGX ENCLS instruction in guest · 0b665d30
      Sean Christopherson 提交于
      Virtualization of Intel SGX depends on Enclave Page Cache (EPC)
      management that is not yet available in the kernel, i.e. KVM support
      for exposing SGX to a guest cannot be added until basic support
      for SGX is upstreamed, which is a WIP[1].
      
      Until SGX is properly supported in KVM, ensure a guest sees expected
      behavior for ENCLS, i.e. all ENCLS #UD.  Because SGX does not have a
      true software enable bit, e.g. there is no CR4.SGXE bit, the ENCLS
      instruction can be executed[1] by the guest if SGX is supported by the
      system.  Intercept all ENCLS leafs (via the ENCLS- exiting control and
      field) and unconditionally inject #UD.
      
      [1] https://www.spinics.net/lists/kvm/msg171333.html or
          https://lkml.org/lkml/2018/7/3/879
      
      [2] A guest can execute ENCLS in the sense that ENCLS will not take
          an immediate #UD, but no ENCLS will ever succeed in a guest
          without explicit support from KVM (map EPC memory into the guest),
          unless KVM has a *very* egregious bug, e.g. accidentally mapped
          EPC memory into the guest SPTEs.  In other words this patch is
          needed only to prevent the guest from seeing inconsistent behavior,
          e.g. #GP (SGX not enabled in Feature Control MSR) or #PF (leaf
          operand(s) does not point at EPC memory) instead of #UD on ENCLS.
          Intercepting ENCLS is not required to prevent the guest from truly
          utilizing SGX.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20180814163334.25724-3-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b665d30
    • Y
      x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush() · d806afa4
      Yi Wang 提交于
      Substitute spaces with tab. No functional changes.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn>
      Message-Id: <1534398159-48509-1-git-send-email-wang.yi59@zte.com.cn>
      Cc: stable@vger.kernel.org # L1TF
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d806afa4
    • A
      x86: kvm: avoid unused variable warning · 7288bde1
      Arnd Bergmann 提交于
      Removing one of the two accesses of the maxphyaddr variable led to
      a harmless warning:
      
      arch/x86/kvm/x86.c: In function 'kvm_set_mmio_spte_mask':
      arch/x86/kvm/x86.c:6563:6: error: unused variable 'maxphyaddr' [-Werror=unused-variable]
      
      Removing the #ifdef seems to be the nicest workaround, as it
      makes the code look cleaner than adding another #ifdef.
      
      Fixes: 28a1f3ac ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: stable@vger.kernel.org # L1TF
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7288bde1
  4. 21 8月, 2018 1 次提交
  5. 15 8月, 2018 1 次提交
  6. 07 8月, 2018 1 次提交
  7. 06 8月, 2018 30 次提交