1. 29 3月, 2019 7 次提交
    • B
      kvm: mmu: Used range based flushing in slot_handle_level_range · f285c633
      Ben Gardon 提交于
      Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address
      in slot_handle_level_range. When range based flushes are not enabled
      kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs.
      
      This changes the behavior of many functions that indirectly use
      slot_handle_level_range, iff the range based flushes are enabled. The
      only potential problem I see with this is that kvm->tlbs_dirty will be
      cleared less often, however the only caller of slot_handle_level_range that
      checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which
      checks it and does a kvm_flush_remote_tlbs after calling
      kvm_unmap_hva_range anyway.
      
      Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
      	without this patch. The patch introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f285c633
    • M
      KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported · 3d9683cf
      Masahiro Yamada 提交于
      I do not see any consistency about headers_install of <linux/kvm_para.h>
      and <asm/kvm_para.h>.
      
      According to my analysis of Linux 5.1-rc1, there are 3 groups:
      
       [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
      
          alpha, arm, hexagon, mips, powerpc, s390, sparc, x86
      
       [2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not
      
          arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc,
          parisc, sh, unicore32, xtensa
      
       [3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
      
          csky, nds32, riscv
      
      This does not match to the actual KVM support. At least, [2] is
      half-baked.
      
      Nor do arch maintainers look like they care about this. For example,
      commit 0add5371 ("microblaze: Add missing kvm_para.h to Kbuild")
      exported <asm/kvm_para.h> to user-space in order to fix an in-kernel
      build error.
      
      We have two ways to make this consistent:
      
       [A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all
           architectures, irrespective of the KVM support
      
       [B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h>
           to the KVM support
      
      My first attempt was [A] because the code looks cleaner, but Paolo
      suggested [B].
      
      So, this commit goes with [B].
      
      For most architectures, <asm/kvm_para.h> was moved to the kernel-space.
      I changed include/uapi/linux/Kbuild so that it checks generated
      asm/kvm_para.h as well as check-in ones.
      
      After this commit, there will be two groups:
      
       [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
      
          arm, arm64, mips, powerpc, s390, x86
      
       [2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
      
          alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze,
          nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3d9683cf
    • W
      KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() · 4d66623c
      Wei Yang 提交于
      * nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is
        non-zero.
      
      * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages()
        never return zero.
      
      Based on these two reasons, we can merge the two *if* clause and use the
      return value from kvm_mmu_calculate_mmu_pages() directly. This simplify
      the code and also eliminate the possibility for reader to believe
      nr_mmu_pages would be zero.
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d66623c
    • K
      kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields · 711eff3a
      Krish Sadhukhan 提交于
      According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
      following check is performed on vmentry of L2 guests:
      
          On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP
          field and the IA32_SYSENTER_EIP field must each contain a canonical
          address.
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      711eff3a
    • S
      KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) · 05d5a486
      Singh, Brijesh 提交于
      Errata#1096:
      
      On a nested data page fault when CR.SMAP=1 and the guest data read
      generates a SMAP violation, GuestInstrBytes field of the VMCB on a
      VMEXIT will incorrectly return 0h instead the correct guest
      instruction bytes .
      
      Recommend Workaround:
      
      To determine what instruction the guest was executing the hypervisor
      will have to decode the instruction at the instruction pointer.
      
      The recommended workaround can not be implemented for the SEV
      guest because guest memory is encrypted with the guest specific key,
      and instruction decoder will not be able to decode the instruction
      bytes. If we hit this errata in the SEV guest then log the message
      and request a guest shutdown.
      Reported-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05d5a486
    • S
      KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' · 47c42e6b
      Sean Christopherson 提交于
      The cr4_pae flag is a bit of a misnomer, its purpose is really to track
      whether the guest PTE that is being shadowed is a 4-byte entry or an
      8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
      reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
      but it was mostly harmless since the size of the gpte never mattered.
      Now that a spte may be tracking an indirect EPT entry, relying on
      CR4.PAE is wrong and ill-named.
      
      For direct shadow pages, force the gpte_size to '1' as they are always
      8-byte entries; EPT entries can only be 8-bytes and KVM always uses
      8-byte entries for NPT and its identity map (when running with EPT but
      not unrestricted guest).
      
      Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
      unique scenario as the size of the entries are not dictated by CR4.PAE,
      but neither is the shadow page a direct map.  To handle this scenario,
      set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
      to denote a nested EPT shadow page.  Use the information to avoid
      incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
      
      Providing a consistent and accurate gpte_size fixes a bug reported by
      Vitaly where fast_cr3_switch() always fails when switching from L2 to
      L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
      whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c42e6b
    • S
      KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT · 552c69b1
      Sean Christopherson 提交于
      Explicitly zero out quadrant and invalid instead of inheriting them from
      the root_mmu.  Functionally, this patch is a nop as we (should) never
      set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only
      allowed if EPT is used for L1, and the root_mmu will never be invalid at
      this point.
      
      Explicitly setting flags sets the stage for repurposing the legacy
      paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which
      point 'smm' would be the only flag to be inherited from root_mmu.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      552c69b1
  2. 23 3月, 2019 2 次提交
  3. 22 3月, 2019 1 次提交
    • V
      x86/mm/pti: Make local symbols static · 4fe64a62
      Valdis Kletnieks 提交于
      With 'make C=2 W=1', sparse and gcc both complain:
      
        CHECK   arch/x86/mm/pti.c
      arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
      arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
        CC      arch/x86/mm/pti.o
      arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
        605 | void pti_set_kernel_image_nonglobal(void)
            |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
      drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
      local (static) symbol.
      
      Make both static.
      Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
      
      4fe64a62
  4. 21 3月, 2019 9 次提交
  5. 20 3月, 2019 1 次提交
    • B
      powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations · 8bc08689
      Ben Hutchings 提交于
      MAX_PHYSMEM_BITS only needs to be defined if CONFIG_SPARSEMEM is
      enabled, and that was the case before commit 4ffe713b
      ("powerpc/mm: Increase the max addressable memory to 2PB").
      
      On 32-bit systems, where CONFIG_SPARSEMEM is not enabled, we now
      define it as 46.  That is larger than the real number of physical
      address bits, and breaks calculations in zsmalloc:
      
        mm/zsmalloc.c:130:49: warning: right shift count is negative
          MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
                                                         ^~
        ...
        mm/zsmalloc.c:253:21: error: variably modified 'size_class' at file scope
          struct size_class *size_class[ZS_SIZE_CLASSES];
                             ^~~~~~~~~~
      
      Fixes: 4ffe713b ("powerpc/mm: Increase the max addressable memory to 2PB")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8bc08689
  6. 19 3月, 2019 9 次提交
  7. 18 3月, 2019 2 次提交
    • C
      powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32 · 4622a2d4
      Christophe Leroy 提交于
      Not only the 603 but all 6xx need SPRN_SPRG_PGDIR to be initialised at
      startup. This patch move it from __setup_cpu_603() to start_here()
      and __secondary_start(), close to the initialisation of SPRN_THREAD.
      
      Previously, virt addr of PGDIR was retrieved from thread struct.
      Now that it is the phys addr which is stored in SPRN_SPRG_PGDIR,
      hash_page() shall not convert it to phys anymore.
      This patch removes the conversion.
      
      Fixes: 93c4a162 ("powerpc/6xx: Store PGDIR physical address in a SPRG")
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4622a2d4
    • M
      powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038 · b5b4453e
      Michael Ellerman 提交于
      Jakub Drnec reported:
        Setting the realtime clock can sometimes make the monotonic clock go
        back by over a hundred years. Decreasing the realtime clock across
        the y2k38 threshold is one reliable way to reproduce. Allegedly this
        can also happen just by running ntpd, I have not managed to
        reproduce that other than booting with rtc at >2038 and then running
        ntp. When this happens, anything with timers (e.g. openjdk) breaks
        rather badly.
      
      And included a test case (slightly edited for brevity):
        #define _POSIX_C_SOURCE 199309L
        #include <stdio.h>
        #include <time.h>
        #include <stdlib.h>
        #include <unistd.h>
      
        long get_time(void) {
          struct timespec tp;
          clock_gettime(CLOCK_MONOTONIC, &tp);
          return tp.tv_sec + tp.tv_nsec / 1000000000;
        }
      
        int main(void) {
          long last = get_time();
          while(1) {
            long now = get_time();
            if (now < last) {
              printf("clock went backwards by %ld seconds!\n", last - now);
            }
            last = now;
            sleep(1);
          }
          return 0;
        }
      
      Which when run concurrently with:
       # date -s 2040-1-1
       # date -s 2037-1-1
      
      Will detect the clock going backward.
      
      The root cause is that wtom_clock_sec in struct vdso_data is only a
      32-bit signed value, even though we set its value to be equal to
      tk->wall_to_monotonic.tv_sec which is 64-bits.
      
      Because the monotonic clock starts at zero when the system boots the
      wall_to_montonic.tv_sec offset is negative for current and future
      dates. Currently on a freshly booted system the offset will be in the
      vicinity of negative 1.5 billion seconds.
      
      However if the wall clock is set past the Y2038 boundary, the offset
      from wall to monotonic becomes less than negative 2^31, and no longer
      fits in 32-bits. When that value is assigned to wtom_clock_sec it is
      truncated and becomes positive, causing the VDSO assembly code to
      calculate CLOCK_MONOTONIC incorrectly.
      
      That causes CLOCK_MONOTONIC to jump ahead by ~4 billion seconds which
      it is not meant to do. Worse, if the time is then set back before the
      Y2038 boundary CLOCK_MONOTONIC will jump backward.
      
      We can fix it simply by storing the full 64-bit offset in the
      vdso_data, and using that in the VDSO assembly code. We also shuffle
      some of the fields in vdso_data to avoid creating a hole.
      
      The original commit that added the CLOCK_MONOTONIC support to the VDSO
      did actually use a 64-bit value for wtom_clock_sec, see commit
      a7f290da ("[PATCH] powerpc: Merge vdso's and add vdso support to
      32 bits kernel") (Nov 2005). However just 3 days later it was
      converted to 32-bits in commit 0c37ec2a ("[PATCH] powerpc: vdso
      fixes (take #2)"), and the bug has existed since then AFAICS.
      
      Fixes: 0c37ec2a ("[PATCH] powerpc: vdso fixes (take #2)")
      Cc: stable@vger.kernel.org # v2.6.15+
      Link: http://lkml.kernel.org/r/HaC.ZfES.62bwlnvAvMP.1STMMj@seznam.czReported-by: NJakub Drnec <jaydee@email.cz>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b5b4453e
  8. 17 3月, 2019 3 次提交
  9. 16 3月, 2019 3 次提交
  10. 15 3月, 2019 3 次提交
    • P
      perf/x86: Fixup typo in stub functions · f764c58b
      Peter Zijlstra 提交于
      Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n:
      
        > With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in:
        >
        > In file included from arch/x86/events/amd/core.c:8:0:
        > arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration
        >  static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)
      
      While harmless (an unsed pointer is an unused pointer, no matter the type)
      it needs fixing.
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: d01b1f96 ("perf/x86/intel: Make cpuc allocations consistent")
      Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f764c58b
    • P
      perf/x86/intel: Fix memory corruption · ede271b0
      Peter Zijlstra 提交于
      Through:
      
        validate_event()
          x86_pmu.get_event_constraints(.idx=-1)
            tfa_get_event_constraints()
              dyn_constraint()
      
      cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.
      
      In this case, simply skip the TFA constraint code, there is no event
      constraint with just PMC3, therefore the code will never result in the
      empty set.
      
      Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort")
      Reported-by: NTony Jones <tonyj@suse.com>
      Reported-by: N"DSouza, Nelson" <nelson.dsouza@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Jones <tonyj@suse.com>
      Tested-by: N"DSouza, Nelson" <nelson.dsouza@intel.com>
      Cc: eranian@google.com
      Cc: jolsa@redhat.com
      Cc: stable@kernel.org
      Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.org
      ede271b0
    • P
      MIPS: Remove custom MIPS32 __kernel_fsid_t type · f6cab793
      Paul Burton 提交于
      For MIPS32 kernels we have a custom definition of __kernel_fsid_t. This
      differs from the asm-generic version used by all other architectures &
      MIPS64 in one way - it declares the val field as an array of long,
      rather than an array of int. Since int & long have identical size &
      alignment when targeting MIPS32 anyway, this makes little sense.
      
      Beyond the pointlessness this causes problems for code which prints
      entries from the val array, for example the fanotify_encode_fid()
      function [1]. If such code uses a format specified suited to an int then
      it encounters compiler warnings when building for MIPS32, such as:
      
        In file included from include/linux/kernel.h:14:0,
                         from include/linux/list.h:9,
                         from include/linux/preempt.h:11,
                         from include/linux/spinlock.h:51,
                         from include/linux/fdtable.h:11,
                         from fs/notify/fanotify/fanotify.c:3:
        fs/notify/fanotify/fanotify.c: In function 'fanotify_encode_fid':
        include/linux/kern_levels.h:5:18: warning: format '%x' expects argument
          of type 'unsigned int', but argument 2 has type 'long int' [-Wformat=]
      
      Remove the custom __kernel_fsid_t definition & make use of the
      asm-generic version which will have an identical layout in memory
      anyway, in order to remove the inconsistency with other architectures.
      
      One possible regression this could cause if is any code is attempting to
      print entries from the val array with a long-sized format specifier, in
      which case it would begin seeing compiler warnings when built against
      kernel headers including this change. Since such code is exceedingly
      rare, and would have to be MIPS32-specific to expect a long, this seems
      to be a problem that it's extremely unlikely anyone will encounter.
      
      [1] https://lore.kernel.org/linux-mips/CAOQ4uxiEkczB7PNCXegFC-eYb9zAGaio_o=OgHAJHFd7eavBxA@mail.gmail.com/T/#mb43103277c79ef06b884359209e817db1c136140Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      f6cab793