1. 15 12月, 2020 12 次提交
    • T
      KVM: SVM: Do not report support for SMM for an SEV-ES guest · 5719455f
      Tom Lendacky 提交于
      SEV-ES guests do not currently support SMM. Update the has_emulated_msr()
      kvm_x86_ops function to take a struct kvm parameter so that the capability
      can be reported at a VM level.
      
      Since this op is also called during KVM initialization and before a struct
      kvm instance is available, comments will be added to each implementation
      of has_emulated_msr() to indicate the kvm parameter can be null.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <75de5138e33b945d2fb17f81ae507bda381808e3.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5719455f
    • T
      KVM: SVM: Add support for CR8 write traps for an SEV-ES guest · d1949b93
      Tom Lendacky 提交于
      For SEV-ES guests, the interception of control register write access
      is not recommended. Control register interception occurs prior to the
      control register being modified and the hypervisor is unable to modify
      the control register itself because the register is located in the
      encrypted register state.
      
      SEV-ES guests introduce new control register write traps. These traps
      provide intercept support of a control register write after the control
      register has been modified. The new control register value is provided in
      the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
      of the guest control registers.
      
      Add support to track the value of the guest CR8 register using the control
      register write trap so that the hypervisor understands the guest operating
      mode.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <5a01033f4c8b3106ca9374b7cadf8e33da852df1.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1949b93
    • T
      KVM: SVM: Add support for CR4 write traps for an SEV-ES guest · 5b51cb13
      Tom Lendacky 提交于
      For SEV-ES guests, the interception of control register write access
      is not recommended. Control register interception occurs prior to the
      control register being modified and the hypervisor is unable to modify
      the control register itself because the register is located in the
      encrypted register state.
      
      SEV-ES guests introduce new control register write traps. These traps
      provide intercept support of a control register write after the control
      register has been modified. The new control register value is provided in
      the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
      of the guest control registers.
      
      Add support to track the value of the guest CR4 register using the control
      register write trap so that the hypervisor understands the guest operating
      mode.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <c3880bf2db8693aa26f648528fbc6e967ab46e25.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5b51cb13
    • T
      KVM: SVM: Add support for CR0 write traps for an SEV-ES guest · f27ad38a
      Tom Lendacky 提交于
      For SEV-ES guests, the interception of control register write access
      is not recommended. Control register interception occurs prior to the
      control register being modified and the hypervisor is unable to modify
      the control register itself because the register is located in the
      encrypted register state.
      
      SEV-ES support introduces new control register write traps. These traps
      provide intercept support of a control register write after the control
      register has been modified. The new control register value is provided in
      the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
      of the guest control registers.
      
      Add support to track the value of the guest CR0 register using the control
      register write trap so that the hypervisor understands the guest operating
      mode.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <182c9baf99df7e40ad9617ff90b84542705ef0d7.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f27ad38a
    • T
      KVM: SVM: Add support for EFER write traps for an SEV-ES guest · 2985afbc
      Tom Lendacky 提交于
      For SEV-ES guests, the interception of EFER write access is not
      recommended. EFER interception occurs prior to EFER being modified and
      the hypervisor is unable to modify EFER itself because the register is
      located in the encrypted register state.
      
      SEV-ES support introduces a new EFER write trap. This trap provides
      intercept support of an EFER write after it has been modified. The new
      EFER value is provided in the VMCB EXITINFO1 field, allowing the
      hypervisor to track the setting of the guest EFER.
      
      Add support to track the value of the guest EFER value using the EFER
      write trap so that the hypervisor understands the guest operating mode.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <8993149352a3a87cd0625b3b61bfd31ab28977e1.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2985afbc
    • T
      KVM: SVM: Support string IO operations for an SEV-ES guest · 7ed9abfe
      Tom Lendacky 提交于
      For an SEV-ES guest, string-based port IO is performed to a shared
      (un-encrypted) page so that both the hypervisor and guest can read or
      write to it and each see the contents.
      
      For string-based port IO operations, invoke SEV-ES specific routines that
      can complete the operation using common KVM port IO support.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ed9abfe
    • T
      KVM: SVM: Add initial support for a VMGEXIT VMEXIT · 291bd20d
      Tom Lendacky 提交于
      SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a
      VMGEXIT includes mapping the GHCB based on the guest GPA, which is
      obtained from a new VMCB field, and then validating the required inputs
      for the VMGEXIT exit reason.
      
      Since many of the VMGEXIT exit reasons correspond to existing VMEXIT
      reasons, the information from the GHCB is copied into the VMCB control
      exit code areas and KVM register areas. The standard exit handlers are
      invoked, similar to standard VMEXIT processing. Before restarting the
      vCPU, the GHCB is updated with any registers that have been updated by
      the hypervisor.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      291bd20d
    • T
      KVM: SVM: Add required changes to support intercepts under SEV-ES · f1c6366e
      Tom Lendacky 提交于
      When a guest is running under SEV-ES, the hypervisor cannot access the
      guest register state. There are numerous places in the KVM code where
      certain registers are accessed that are not allowed to be accessed (e.g.
      RIP, CR0, etc). Add checks to prevent register accesses and add intercept
      update support at various points within the KVM code.
      
      Also, when handling a VMGEXIT, exceptions are passed back through the
      GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error,
      update the SVM intercepts to handle this for SEV-ES guests.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      [Redo MSR part using the .complete_emulated_msr callback. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f1c6366e
    • P
      KVM: x86: introduce complete_emulated_msr callback · f9a4d621
      Paolo Bonzini 提交于
      This will be used by SEV-ES to inject MSR failure via the GHCB.
      Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f9a4d621
    • T
      KVM: SVM: Add support for the SEV-ES VMSA · add5e2f0
      Tom Lendacky 提交于
      Allocate a page during vCPU creation to be used as the encrypted VM save
      area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch
      structure that indicates whether the guest state is protected.
      
      When freeing a VMSA page that has been encrypted, the cache contents must
      be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page.
      
      [ i386 build warnings ]
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <fde272b17eec804f3b9db18c131262fe074015c5.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      add5e2f0
    • T
      KVM: SVM: Add GHCB accessor functions for retrieving fields · 0f60bde1
      Tom Lendacky 提交于
      Update the GHCB accessor functions to add functions for retrieve GHCB
      fields by name. Update existing code to use the new accessor functions.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <664172c53a5fb4959914e1a45d88e805649af0ad.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f60bde1
    • T
      x86/cpu: Add VM page flush MSR availablility as a CPUID feature · 69372cf0
      Tom Lendacky 提交于
      On systems that do not have hardware enforced cache coherency between
      encrypted and unencrypted mappings of the same physical page, the
      hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache
      contents of an SEV guest page. When a small number of pages are being
      flushed, this can be used in place of issuing a WBINVD across all CPUs.
      
      CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is
      available. Add a CPUID feature to indicate it is supported and define the
      MSR.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <f1966379e31f9b208db5257509c4a089a87d33d0.1607620209.git.thomas.lendacky@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69372cf0
  2. 12 12月, 2020 1 次提交
    • K
      x86: Enumerate AVX512 FP16 CPUID feature flag · e1b35da5
      Kyung Min Park 提交于
      Enumerate AVX512 Half-precision floating point (FP16) CPUID feature
      flag. Compared with using FP32, using FP16 cut the number of bits
      required for storage in half, reducing the exponent from 8 bits to 5,
      and the mantissa from 23 bits to 10. Using FP16 also enables developers
      to train and run inference on deep learning models fast when all
      precision or magnitude (FP32) is not needed.
      
      A processor supports AVX512 FP16 if CPUID.(EAX=7,ECX=0):EDX[bit 23]
      is present. The AVX512 FP16 requires AVX512BW feature be implemented
      since the instructions for manipulating 32bit masks are associated with
      AVX512BW.
      
      The only in-kernel usage of this is kvm passthrough. The CPU feature
      flag is shown as "avx512_fp16" in /proc/cpuinfo.
      Signed-off-by: NKyung Min Park <kyung.min.park@intel.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Message-Id: <20201208033441.28207-2-kyung.min.park@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e1b35da5
  3. 15 11月, 2020 4 次提交
    • P
      KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed
      Peter Xu 提交于
      This patch is heavily based on previous work from Lei Cao
      <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]
      
      KVM currently uses large bitmaps to track dirty memory.  These bitmaps
      are copied to userspace when userspace queries KVM for its dirty page
      information.  The use of bitmaps is mostly sufficient for live
      migration, as large parts of memory are be dirtied from one log-dirty
      pass to another.  However, in a checkpointing system, the number of
      dirty pages is small and in fact it is often bounded---the VM is
      paused when it has dirtied a pre-defined number of pages. Traversing a
      large, sparsely populated bitmap to find set bits is time-consuming,
      as is copying the bitmap to user-space.
      
      A similar issue will be there for live migration when the guest memory
      is huge while the page dirty procedure is trivial.  In that case for
      each dirty sync we need to pull the whole dirty bitmap to userspace
      and analyse every bit even if it's mostly zeros.
      
      The preferred data structure for above scenarios is a dense list of
      guest frame numbers (GFN).  This patch series stores the dirty list in
      kernel memory that can be memory mapped into userspace to allow speedy
      harvesting.
      
      This patch enables dirty ring for X86 only.  However it should be
      easily extended to other archs as well.
      
      [1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012222.5767-1-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb04a1ed
    • P
      KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] · ff5a983c
      Peter Xu 提交于
      Originally, we have three code paths that can dirty a page without
      vcpu context for X86:
      
        - init_rmode_identity_map
        - init_rmode_tss
        - kvmgt_rw_gpa
      
      init_rmode_identity_map and init_rmode_tss will be setup on
      destination VM no matter what (and the guest cannot even see them), so
      it does not make sense to track them at all.
      
      To do this, allow __x86_set_memory_region() to return the userspace
      address that just allocated to the caller.  Then in both of the
      functions we directly write to the userspace address instead of
      calling kvm_write_*() APIs.
      
      Another trivial change is that we don't need to explicitly clear the
      identity page table root in init_rmode_identity_map() because no
      matter what we'll write to the whole page with 4M huge page entries.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012044.5151-4-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ff5a983c
    • Y
      KVM: x86: emulate wait-for-SIPI and SIPI-VMExit · bf0cd88c
      Yadong Qi 提交于
      Background: We have a lightweight HV, it needs INIT-VMExit and
      SIPI-VMExit to wake-up APs for guests since it do not monitor
      the Local APIC. But currently virtual wait-for-SIPI(WFS) state
      is not supported in nVMX, so when running on top of KVM, the L1
      HV cannot receive the INIT-VMExit and SIPI-VMExit which cause
      the L2 guest cannot wake up the APs.
      
      According to Intel SDM Chapter 25.2 Other Causes of VM Exits,
      SIPIs cause VM exits when a logical processor is in
      wait-for-SIPI state.
      
      In this patch:
          1. introduce SIPI exit reason,
          2. introduce wait-for-SIPI state for nVMX,
          3. advertise wait-for-SIPI support to guest.
      
      When L1 hypervisor is not monitoring Local APIC, L0 need to emulate
      INIT-VMExit and SIPI-VMExit to L1 to emulate INIT-SIPI-SIPI for
      L2. L2 LAPIC write would be traped by L0 Hypervisor(KVM), L0 should
      emulate the INIT/SIPI vmexit to L1 hypervisor to set proper state
      for L2's vcpu state.
      
      Handle procdure:
      Source vCPU:
          L2 write LAPIC.ICR(INIT).
          L0 trap LAPIC.ICR write(INIT): inject a latched INIT event to target
             vCPU.
      Target vCPU:
          L0 emulate an INIT VMExit to L1 if is guest mode.
          L1 set guest VMCS, guest_activity_state=WAIT_SIPI, vmresume.
          L0 set vcpu.mp_state to INIT_RECEIVED if (vmcs12.guest_activity_state
             == WAIT_SIPI).
      
      Source vCPU:
          L2 write LAPIC.ICR(SIPI).
          L0 trap LAPIC.ICR write(INIT): inject a latched SIPI event to traget
             vCPU.
      Target vCPU:
          L0 emulate an SIPI VMExit to L1 if (vcpu.mp_state == INIT_RECEIVED).
          L1 set CS:IP, guest_activity_state=ACTIVE, vmresume.
          L0 resume to L2.
          L2 start-up.
      Signed-off-by: NYadong Qi <yadong.qi@intel.com>
      Message-Id: <20200922052343.84388-1-yadong.qi@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20201106065122.403183-1-yadong.qi@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf0cd88c
    • S
      KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · c2fe3cd4
      Sean Christopherson 提交于
      Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
      and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
      KVM_SET_SREGS would return success while failing to actually set CR4.
      
      Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
      in __set_sregs() is not a viable option as KVM has already stuffed a
      variety of vCPU state.
      
      Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
      inverted semantics.  This will be remedied in a future patch.
      
      Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2fe3cd4
  4. 13 11月, 2020 1 次提交
    • B
      KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch · 0107973a
      Babu Moger 提交于
      SEV guests fail to boot on a system that supports the PCID feature.
      
      While emulating the RSM instruction, KVM reads the guest CR3
      and calls kvm_set_cr3(). If the vCPU is in the long mode,
      kvm_set_cr3() does a sanity check for the CR3 value. In this case,
      it validates whether the value has any reserved bits set. The
      reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
      encryption is enabled, the memory encryption bit is set in the CR3
      value. The memory encryption bit may fall within the KVM reserved
      bit range, causing the KVM emulation failure.
      
      Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
      cache the reserved bits in the CR3 value. This will be initialized
      to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).
      
      If the architecture has any special bits(like AMD SEV encryption bit)
      that needs to be masked from the reserved bits, should be cleared
      in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.
      
      Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0107973a
  5. 29 10月, 2020 1 次提交
  6. 26 10月, 2020 1 次提交
  7. 24 10月, 2020 1 次提交
    • R
      x86/uaccess: fix code generation in put_user() · 9c5743df
      Rasmus Villemoes 提交于
      Quoting https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:
      
        You can define a local register variable and associate it with a
        specified register...
      
        The only supported use for this feature is to specify registers for
        input and output operands when calling Extended asm (see Extended
        Asm). This may be necessary if the constraints for a particular
        machine don't provide sufficient control to select the desired
        register.
      
      On 32-bit x86, this is used to ensure that gcc will put an 8-byte value
      into the %edx:%eax pair, while all other cases will just use the single
      register %eax (%rax on x86-64).  While the _ASM_AX actually just expands
      to "%eax", note this comment next to get_user() which does something
      very similar:
      
       * The use of _ASM_DX as the register specifier is a bit of a
       * simplification, as gcc only cares about it as the starting point
       * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
       * (%ecx being the next register in gcc's x86 register sequence), and
       * %rdx on 64 bits.
      
      However, getting this to work requires that there is no code between the
      assignment to the local register variable and its use as an input to the
      asm() which can possibly clobber any of the registers involved -
      including evaluation of the expressions making up other inputs.
      
      In the current code, the ptr expression used directly as an input may
      cause such code to be emitted.  For example, Sean Christopherson
      observed that with KASAN enabled and ptr being current->set_child_tid
      (from chedule_tail()), the load of current->set_child_tid causes a call
      to __asan_load8() to be emitted immediately prior to the __put_user_4
      call, and Naresh Kamboju reports that various mmstress tests fail on
      KASAN-enabled builds.
      
      It's also possible to synthesize a broken case without KASAN if one uses
      "foo()" as the ptr argument, with foo being some "extern u64 __user
      *foo(void);" (though I don't know if that appears in real code).
      
      Fix it by making sure ptr gets evaluated before the assignment to
      __val_pu, and add a comment that __val_pu must be the last thing
      computed before the asm() is entered.
      
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Fixes: d55564cf ("x86: Make __put_user() generate an out-of-line call")
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c5743df
  8. 23 10月, 2020 1 次提交
  9. 22 10月, 2020 6 次提交
  10. 15 10月, 2020 1 次提交
  11. 14 10月, 2020 1 次提交
    • D
      x86/numa: cleanup configuration dependent command-line options · 2dd57d34
      Dan Williams 提交于
      Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5.
      
      The device-dax facility allows an address range to be directly mapped
      through a chardev, or optionally hotplugged to the core kernel page
      allocator as System-RAM.  It is the mechanism for converting persistent
      memory (pmem) to be used as another volatile memory pool i.e.  the current
      Memory Tiering hot topic on linux-mm.
      
      In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
      it, but that labeling mechanism is not available / applicable to
      soft-reserved ("EFI specific purpose") memory [3].  This series provides a
      sysfs-mechanism for the daxctl utility to enable provisioning of
      volatile-soft-reserved memory ranges.
      
      The motivations for this facility are:
      
      1/ Allow performance differentiated memory ranges to be split between
         kernel-managed and directly-accessed use cases.
      
      2/ Allow physical memory to be provisioned along performance relevant
         address boundaries. For example, divide a memory-side cache [4] along
         cache-color boundaries.
      
      3/ Parcel out soft-reserved memory to VMs using device-dax as a security
         / permissions boundary [5]. Specifically I have seen people (ab)using
         memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
         device-dax interface on custom address ranges. A follow-on for the VM
         use case is to teach device-dax to dynamically allocate 'struct page' at
         runtime to reduce the duplication of 'struct page' space in both the
         guest and the host kernel for the same physical pages.
      
      [2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
      [3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
      [4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
      [5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com
      
      This patch (of 23):
      
      In preparation for adding a new numa= option clean up the existing ones to
      avoid ifdefs in numa_setup(), and provide feedback when the option is
      numa=fake= option is invalid due to kernel config.  The same does not need
      to be done for numa=noacpi, since the capability is already hard disabled
      at compile-time.
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2dd57d34
  12. 13 10月, 2020 3 次提交
    • N
      x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT · 865c50e1
      Nick Desaulniers 提交于
      Clang-11 shipped support for outputs to asm goto statments along the
      fallthrough path.  Double up some of the get_user() and related macros
      to be able to take advantage of this extended GNU C extension. This
      should help improve the generated code's performance for these accesses.
      
      Cc: Bill Wendling <morbo@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      865c50e1
    • L
      x86: Make __put_user() generate an out-of-line call · d55564cf
      Linus Torvalds 提交于
      Instead of inlining the stac/mov/clac sequence (which also requires
      individual exception table entries and several asm instruction
      alternatives entries), just generate "call __put_user_nocheck_X" for the
      __put_user() cases, the same way we changed __get_user earlier.
      
      Unlike the get_user() case, we didn't have the same nice infrastructure
      to just generate the call with a single case, so this actually has to
      change some of the infrastructure in order to do this.  But that only
      cleans up the code further.
      
      So now, instead of using a case statement for the sizes, we just do the
      same thing we've done on the get_user() side for a long time: use the
      size as an immediate constant to the asm, and generate the asm that way
      directly.
      
      In order to handle the special case of 64-bit data on a 32-bit kernel, I
      needed to change the calling convention slightly: the data is passed in
      %eax[:%edx], the pointer in %ecx, and the return value is also returned
      in %ecx.  It used to be returned in %eax, but because of how %eax can
      now be a double register input, we don't want mix that with a
      single-register output.
      
      The actual low-level asm is easier to handle: we'll just share the code
      between the checking and non-checking case, with the non-checking case
      jumping into the middle of the function.  That may sound a bit too
      special, but this code is all very very special anyway, so...
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d55564cf
    • L
      x86: Make __get_user() generate an out-of-line call · ea6f043f
      Linus Torvalds 提交于
      Instead of inlining the whole stac/lfence/mov/clac sequence (which also
      requires individual exception table entries and several asm instruction
      alternatives entries), just generate "call __get_user_nocheck_X" for the
      __get_user() cases.
      
      We can use all the same infrastructure that we already do for the
      regular "get_user()", and the end result is simpler source code, and
      much simpler code generation.
      
      It also means that when I introduce asm goto with input for
      "unsafe_get_user()", there are no nasty interactions with the
      __get_user() code.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea6f043f
  13. 08 10月, 2020 1 次提交
  14. 07 10月, 2020 6 次提交