1. 03 4月, 2019 10 次提交
    • M
      asm-generic/tlb: Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y · 952a31c9
      Martin Schwidefsky 提交于
      Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
      mmu_gather code. If the option is set the mmu_gather will not
      track individual pages for delayed page free anymore. A platform
      that enables the option needs to provide its own implementation
      of the __tlb_remove_page_size() function to free pages.
      
      No change in behavior intended.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: linux@armlinux.org.uk
      Cc: npiggin@gmail.com
      Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      952a31c9
    • P
      arch/tlb: Clean up simple architectures · 6137fed0
      Peter Zijlstra 提交于
      For the architectures that do not implement their own tlb_flush() but
      do already use the generic mmu_gather, there are two options:
      
       1) the platform has an efficient flush_tlb_range() and
          asm-generic/tlb.h doesn't need any overrides at all.
      
       2) the platform lacks an efficient flush_tlb_range() and
          we select MMU_GATHER_NO_RANGE to minimize full invalidates.
      
      Convert all 'simple' architectures to one of these two forms.
      
      alpha:	    has no range invalidate -> 2
      arc:	    already used flush_tlb_range() -> 1
      c6x:	    has no range invalidate -> 2
      hexagon:    has an efficient flush_tlb_range() -> 1
                  (flush_tlb_mm() is in fact a full range invalidate,
      	     so no need to shoot down everything)
      m68k:	    has inefficient flush_tlb_range() -> 2
      microblaze: has no flush_tlb_range() -> 2
      mips:	    has efficient flush_tlb_range() -> 1
      	    (even though it currently seems to use flush_tlb_mm())
      nds32:	    already uses flush_tlb_range() -> 1
      nios2:	    has inefficient flush_tlb_range() -> 2
      	    (no limit on range iteration)
      openrisc:   has inefficient flush_tlb_range() -> 2
      	    (no limit on range iteration)
      parisc:	    already uses flush_tlb_range() -> 1
      sparc32:    already uses flush_tlb_range() -> 1
      unicore32:  has inefficient flush_tlb_range() -> 2
      	    (no limit on range iteration)
      xtensa:	    has efficient flush_tlb_range() -> 1
      
      Note this also fixes a bug in the existing code for a number
      platforms. Those platforms that did:
      
        tlb_end_vma() -> if (!full_mm) flush_tlb_*()
        tlb_flush -> if (full_mm) flush_tlb_mm()
      
      missed the case of shift_arg_pages(), which doesn't have @fullmm set,
      nor calls into tlb_*vma(), but still frees page-tables and thus needs
      an invalidate. The new code handles this by detecting a non-empty
      range, and either issuing the matching range invalidate or a full
      invalidate, depending on the capabilities.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6137fed0
    • P
      um/tlb: Convert to generic mmu_gather · 7bb8709d
      Peter Zijlstra 提交于
      Generic mmu_gather provides the simple flush_tlb_range() based range
      tracking mmu_gather UM needs.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7bb8709d
    • P
      sh/tlb: Convert SH to generic mmu_gather · c5b27a88
      Peter Zijlstra 提交于
      Generic mmu_gather provides everything SH needs (range tracking and
      cache coherency).
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c5b27a88
    • P
      ia64/tlb: Convert to generic mmu_gather · e1547007
      Peter Zijlstra 提交于
      Generic mmu_gather provides everything ia64 needs (range tracking).
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e1547007
    • P
      arm/tlb: Convert to generic mmu_gather · b78180b9
      Peter Zijlstra 提交于
      Generic mmu_gather provides everything that ARM needs:
      
       - range tracking
       - RCU table free
       - VM_EXEC tracking
       - VIPT cache flushing
      
      The one notable curiosity is the 'funny' range tracking for classical
      ARM in __pte_free_tlb().
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b78180b9
    • P
      asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE · 96bc9567
      Peter Zijlstra 提交于
      Make issuing a TLB invalidate for page-table pages the normal case.
      
      The reason is twofold:
      
       - too many invalidates is safer than too few,
       - most architectures use the linux page-tables natively
         and would thus require this.
      
      Make it an opt-out, instead of an opt-in.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      96bc9567
    • P
      asm-generic/tlb, arch: Provide generic tlb_flush() based on flush_tlb_range() · 5f307be1
      Peter Zijlstra 提交于
      Provide a generic tlb_flush() implementation that relies on
      flush_tlb_range(). This is a little awkward because flush_tlb_range()
      assumes a VMA for range invalidation, but we no longer have one.
      
      Audit of all flush_tlb_range() implementations shows only vma->vm_mm
      and vma->vm_flags are used, and of the latter only VM_EXEC (I-TLB
      invalidates) and VM_HUGETLB (large TLB invalidate) are used.
      
      Therefore, track VM_EXEC and VM_HUGETLB in two more bits, and create a
      'fake' VMA.
      
      This allows architectures that have a reasonably efficient
      flush_tlb_range() to not require any additional effort.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5f307be1
    • P
      asm-generic/tlb, arch: Provide generic VIPT cache flush · e7fd28a7
      Peter Zijlstra 提交于
      The one obvious thing SH and ARM want is a sensible default for
      tlb_start_vma(). (also: https://lkml.org/lkml/2004/1/15/6 )
      
      Avoid all VIPT architectures providing their own tlb_start_vma()
      implementation and rely on architectures to provide a no-op
      flush_cache_range() when it is not relevant.
      
      This patch makes tlb_start_vma() default to flush_cache_range(), which
      should be right and sufficient. The only exceptions that I found where
      (oddly):
      
        - m68k-mmu
        - sparc64
        - unicore
      
      Those architectures appear to have flush_cache_range(), but their
      current tlb_start_vma() does not call it.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e7fd28a7
    • P
      asm-generic/tlb, arch: Provide CONFIG_HAVE_MMU_GATHER_PAGE_SIZE · ed6a7935
      Peter Zijlstra 提交于
      Move the mmu_gather::page_size things into the generic code instead of
      PowerPC specific bits.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed6a7935
  2. 29 3月, 2019 14 次提交
    • M
      x86/realmode: Make set_real_mode_mem() static inline · f560bd19
      Matteo Croce 提交于
      Remove the unused @size argument and move it into a header file, so it
      can be inlined.
      
       [ bp: Massage. ]
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-efi <linux-efi@vger.kernel.org>
      Cc: platform-driver-x86@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190328114233.27835-1-mcroce@redhat.com
      f560bd19
    • M
      powerpc/pseries/mce: Fix misleading print for TLB mutlihit · 6f845ebe
      Mahesh Salgaonkar 提交于
      On pseries, TLB multihit are reported as D-Cache Multihit. This is because
      the wrongly populated mc_err_types[] array. Per PAPR, TLB error type is 0x04
      and mc_err_types[4] points to "D-Cache" instead of "TLB" string. Fixup the
      mc_err_types[] array.
      
      Machine check error type per PAPR:
        0x00 = Uncorrectable Memory Error (UE)
        0x01 = SLB error
        0x02 = ERAT Error
        0x04 = TLB error
        0x05 = D-Cache error
        0x07 = I-Cache error
      
      Fixes: 8f0b8056 ("powerpc/pseries: Display machine check error details.")
      Cc: stable@vger.kernel.org # v4.20+
      Reported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f845ebe
    • S
      KVM: x86: update %rip after emulating IO · 45def77e
      Sean Christopherson 提交于
      Most (all?) x86 platforms provide a port IO based reset mechanism, e.g.
      OUT 92h or CF9h.  Userspace may emulate said mechanism, i.e. reset a
      vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM
      that it is doing a reset, e.g. Qemu jams vCPU state and resumes running.
      
      To avoid corruping %rip after such a reset, commit 0967b7bf ("KVM:
      Skip pio instruction when it is emulated, not executed") changed the
      behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the
      instruction prior to exiting to userspace.  Full emulation doesn't need
      such tricks becase re-emulating the instruction will naturally handle
      %rip being changed to point at the reset vector.
      
      Updating %rip prior to executing to userspace has several drawbacks:
      
        - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation
          fails it will likely yell about the wrong address.
        - Single step exits to userspace for are effectively dropped as
          KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO.
        - Behavior of PIO emulation is different depending on whether it
          goes down the fast path or the slow path.
      
      Rather than skip the PIO instruction before exiting to userspace,
      snapshot the linear %rip and cancel PIO completion if the current
      value does not match the snapshot.  For a 64-bit vCPU, i.e. the most
      common scenario, the snapshot and comparison has negligible overhead
      as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra
      VMREAD in this case.
      
      All other alternatives to snapshotting the linear %rip that don't
      rely on an explicit reset announcenment suffer from one corner case
      or another.  For example, canceling PIO completion on any write to
      %rip fails if userspace does a save/restore of %rip, and attempting to
      avoid that issue by canceling PIO only if %rip changed then fails if PIO
      collides with the reset %rip.  Attempting to zero in on the exact reset
      vector won't work for APs, which means adding more hooks such as the
      vCPU's MP_STATE, and so on and so forth.
      
      Checking for a linear %rip match technically suffers from corner cases,
      e.g. userspace could theoretically rewrite the underlying code page and
      expect a different instruction to execute, or the guest hardcodes a PIO
      reset at 0xfffffff0, but those are far, far outside of what can be
      considered normal operation.
      
      Fixes: 432baf60 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45def77e
    • V
      x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init · 013cc6eb
      Vitaly Kuznetsov 提交于
      When userspace initializes guest vCPUs it may want to zero all supported
      MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/
      HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5 ("kvm/x86: Update SynIC
      timers on guest entry only") we began doing stimer_mark_pending()
      unconditionally on every config change.
      
      The issue I'm observing manifests itself as following:
      - Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as
        pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER;
      - kvm_hv_has_stimer_pending() starts returning true;
      - kvm_vcpu_has_events() starts returning true;
      - kvm_arch_vcpu_runnable() starts returning true;
      - when kvm_arch_vcpu_ioctl_run() gets into
        (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case:
        - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns
          immediately, avoiding normal wait path;
        - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing
          userspace to retry.
      
      So instead of normal wait path we get a busy loop on all secondary vCPUs
      before they get INIT signal. This seems to be undesirable, especially given
      that this happens even when Hyper-V extensions are not used.
      
      Generally, it seems to be pointless to mark an stimer as pending in
      stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing
      kvm_hv_process_stimers() will do is clear the corresponding bit. We may
      just not mark disabled timers as pending instead.
      
      Fixes: f3b138c5 ("kvm/x86: Update SynIC timers on guest entry only")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      013cc6eb
    • X
      kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs · 2bdb76c0
      Xiaoyao Li 提交于
      Since MSR_IA32_ARCH_CAPABILITIES is emualted unconditionally even if
      host doesn't suppot it. We should move it to array emulated_msrs from
      arry msrs_to_save, to report to userspace that guest support this msr.
      Signed-off-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2bdb76c0
    • S
      KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 0cf9135b
      Sean Christopherson 提交于
      The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
      userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
      regardless of hardware support under the pretense that KVM fully
      emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
      handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
      also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
      
      Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
      that it's emulated on AMD hosts.
      
      Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
      Cc: stable@vger.kernel.org
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0cf9135b
    • B
      kvm: mmu: Used range based flushing in slot_handle_level_range · f285c633
      Ben Gardon 提交于
      Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address
      in slot_handle_level_range. When range based flushes are not enabled
      kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs.
      
      This changes the behavior of many functions that indirectly use
      slot_handle_level_range, iff the range based flushes are enabled. The
      only potential problem I see with this is that kvm->tlbs_dirty will be
      cleared less often, however the only caller of slot_handle_level_range that
      checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which
      checks it and does a kvm_flush_remote_tlbs after calling
      kvm_unmap_hva_range anyway.
      
      Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
      	without this patch. The patch introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f285c633
    • M
      KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported · 3d9683cf
      Masahiro Yamada 提交于
      I do not see any consistency about headers_install of <linux/kvm_para.h>
      and <asm/kvm_para.h>.
      
      According to my analysis of Linux 5.1-rc1, there are 3 groups:
      
       [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
      
          alpha, arm, hexagon, mips, powerpc, s390, sparc, x86
      
       [2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not
      
          arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc,
          parisc, sh, unicore32, xtensa
      
       [3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
      
          csky, nds32, riscv
      
      This does not match to the actual KVM support. At least, [2] is
      half-baked.
      
      Nor do arch maintainers look like they care about this. For example,
      commit 0add5371 ("microblaze: Add missing kvm_para.h to Kbuild")
      exported <asm/kvm_para.h> to user-space in order to fix an in-kernel
      build error.
      
      We have two ways to make this consistent:
      
       [A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all
           architectures, irrespective of the KVM support
      
       [B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h>
           to the KVM support
      
      My first attempt was [A] because the code looks cleaner, but Paolo
      suggested [B].
      
      So, this commit goes with [B].
      
      For most architectures, <asm/kvm_para.h> was moved to the kernel-space.
      I changed include/uapi/linux/Kbuild so that it checks generated
      asm/kvm_para.h as well as check-in ones.
      
      After this commit, there will be two groups:
      
       [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
      
          arm, arm64, mips, powerpc, s390, x86
      
       [2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
      
          alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze,
          nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3d9683cf
    • W
      KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() · 4d66623c
      Wei Yang 提交于
      * nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is
        non-zero.
      
      * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages()
        never return zero.
      
      Based on these two reasons, we can merge the two *if* clause and use the
      return value from kvm_mmu_calculate_mmu_pages() directly. This simplify
      the code and also eliminate the possibility for reader to believe
      nr_mmu_pages would be zero.
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d66623c
    • K
      kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields · 711eff3a
      Krish Sadhukhan 提交于
      According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
      following check is performed on vmentry of L2 guests:
      
          On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP
          field and the IA32_SYSENTER_EIP field must each contain a canonical
          address.
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      711eff3a
    • S
      KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) · 05d5a486
      Singh, Brijesh 提交于
      Errata#1096:
      
      On a nested data page fault when CR.SMAP=1 and the guest data read
      generates a SMAP violation, GuestInstrBytes field of the VMCB on a
      VMEXIT will incorrectly return 0h instead the correct guest
      instruction bytes .
      
      Recommend Workaround:
      
      To determine what instruction the guest was executing the hypervisor
      will have to decode the instruction at the instruction pointer.
      
      The recommended workaround can not be implemented for the SEV
      guest because guest memory is encrypted with the guest specific key,
      and instruction decoder will not be able to decode the instruction
      bytes. If we hit this errata in the SEV guest then log the message
      and request a guest shutdown.
      Reported-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05d5a486
    • S
      KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' · 47c42e6b
      Sean Christopherson 提交于
      The cr4_pae flag is a bit of a misnomer, its purpose is really to track
      whether the guest PTE that is being shadowed is a 4-byte entry or an
      8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
      reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
      but it was mostly harmless since the size of the gpte never mattered.
      Now that a spte may be tracking an indirect EPT entry, relying on
      CR4.PAE is wrong and ill-named.
      
      For direct shadow pages, force the gpte_size to '1' as they are always
      8-byte entries; EPT entries can only be 8-bytes and KVM always uses
      8-byte entries for NPT and its identity map (when running with EPT but
      not unrestricted guest).
      
      Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
      unique scenario as the size of the entries are not dictated by CR4.PAE,
      but neither is the shadow page a direct map.  To handle this scenario,
      set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
      to denote a nested EPT shadow page.  Use the information to avoid
      incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
      
      Providing a consistent and accurate gpte_size fixes a bug reported by
      Vitaly where fast_cr3_switch() always fails when switching from L2 to
      L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
      whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
      
      Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c42e6b
    • S
      KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT · 552c69b1
      Sean Christopherson 提交于
      Explicitly zero out quadrant and invalid instead of inheriting them from
      the root_mmu.  Functionally, this patch is a nop as we (should) never
      set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only
      allowed if EPT is used for L1, and the root_mmu will never be invalid at
      this point.
      
      Explicitly setting flags sets the stage for repurposing the legacy
      paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which
      point 'smm' would be the only flag to be inherited from root_mmu.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      552c69b1
    • J
      x86/cpufeature: Fix __percpu annotation in this_cpu_has() · f6027c81
      Jann Horn 提交于
      &cpu_info.x86_capability is __percpu, and the second argument of
      x86_this_cpu_test_bit() is expected to be __percpu. Don't cast the
      __percpu away and then implicitly add it again. This gets rid of 106
      lines of sparse warnings with the kernel config I'm using.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190328154948.152273-1-jannh@google.com
      f6027c81
  3. 28 3月, 2019 5 次提交
    • R
      x86/mm: Don't exceed the valid physical address space · 92c77f7c
      Ralph Campbell 提交于
      valid_phys_addr_range() is used to sanity check the physical address range
      of an operation, e.g., access to /dev/mem. It uses __pa(high_memory)
      internally.
      
      If memory is populated at the end of the physical address space, then
      __pa(high_memory) is outside of the physical address space because:
      
         high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
      
      For the comparison in valid_phys_addr_range() this is not an issue, but if
      CONFIG_DEBUG_VIRTUAL is enabled, __pa() maps to __phys_addr(), which
      verifies that the resulting physical address is within the valid physical
      address space of the CPU. So in the case that memory is populated at the
      end of the physical address space, this is not true and triggers a
      VIRTUAL_BUG_ON().
      
      Use __pa(high_memory - 1) to prevent the conversion from going beyond
      the end of valid physical addresses.
      
      Fixes: be62a320 ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses")
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Craig Bergstrom <craigb@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Sean Young <sean@mess.org>
      
      Link: https://lkml.kernel.org/r/20190326001817.15413-2-rcampbell@nvidia.com
      92c77f7c
    • D
      x86/retpolines: Disable switch jump tables when retpolines are enabled · a9d57ef1
      Daniel Borkmann 提交于
      Commit ce02ef06 ("x86, retpolines: Raise limit for generating indirect
      calls from switch-case") raised the limit under retpolines to 20 switch
      cases where gcc would only then start to emit jump tables, and therefore
      effectively disabling the emission of slow indirect calls in this area.
      
      After this has been brought to attention to gcc folks [0], Martin Liska
      has then fixed gcc to align with clang by avoiding to generate switch jump
      tables entirely under retpolines. This is taking effect in gcc starting
      from stable version 8.4.0. Given kernel supports compilation with older
      versions of gcc where the fix is not being available or backported anymore,
      we need to keep the extra KBUILD_CFLAGS around for some time and generally
      set the -fno-jump-tables to align with what more recent gcc is doing
      automatically today.
      
      More than 20 switch cases are not expected to be fast-path critical, but
      it would still be good to align with gcc behavior for versions < 8.4.0 in
      order to have consistency across supported gcc versions. vmlinux size is
      slightly growing by 0.27% for older gcc. This flag is only set to work
      around affected gcc, no change for clang.
      
        [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952Suggested-by: NMartin Liska <mliska@suse.cz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Björn Töpel<bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net
      a9d57ef1
    • T
      x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y · bebd024e
      Thomas Gleixner 提交于
      The SMT disable 'nosmt' command line argument is not working properly when
      CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
      required to be brought up due to the MCE issues, cannot work. The CPUs are
      then kept in a half dead state.
      
      As the 'nosmt' functionality has become popular due to the speculative
      hardware vulnerabilities, the half torn down state is not a proper solution
      to the problem.
      
      Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
      possible.
      Reported-by: NTianyu Lan <Tianyu.Lan@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mukesh Ojha <mojha@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.de
      bebd024e
    • T
      s390/cpumf: Fix warning from check_processor_id · b6ffdf27
      Thomas Richter 提交于
      Function __hw_perf_event_init() used a CPU variable without
      ensuring CPU preemption has been disabled. This caused the
      following warning in the kernel log:
      
        [ 7.277085] BUG: using smp_processor_id() in preemptible
                       [00000000] code: cf-csdiag/1892
        [ 7.277111] caller is cf_diag_event_init+0x13a/0x338
        [ 7.277122] CPU: 10 PID: 1892 Comm: cf-csdiag Not tainted
                       5.0.0-20190318.rc0.git0.9e1a11e0f602.300.fc29.s390x+debug #1
        [ 7.277131] Hardware name: IBM 2964 NC9 712 (LPAR)
        [ 7.277139] Call Trace:
        [ 7.277150] ([<000000000011385a>] show_stack+0x82/0xd0)
        [ 7.277161]  [<0000000000b7a71a>] dump_stack+0x92/0xd0
        [ 7.277174]  [<00000000007b7e9c>] check_preemption_disabled+0xe4/0x100
        [ 7.277183]  [<00000000001228aa>] cf_diag_event_init+0x13a/0x338
        [ 7.277195]  [<00000000002cf3aa>] perf_try_init_event+0x72/0xf0
        [ 7.277204]  [<00000000002d0bba>] perf_event_alloc+0x6fa/0xce0
        [ 7.277214]  [<00000000002dc4a8>] __s390x_sys_perf_event_open+0x398/0xd50
        [ 7.277224]  [<0000000000b9e8f0>] system_call+0xdc/0x2d8
        [ 7.277233] 2 locks held by cf-csdiag/1892:
        [ 7.277241]  #0: 00000000976f5510 (&sig->cred_guard_mutex){+.+.},
                        at: __s390x_sys_perf_event_open+0xd2e/0xd50
        [ 7.277257]  #1: 00000000363b11bd (&pmus_srcu){....},
                        at: perf_event_alloc+0x52e/0xce0
      
      The variable is now accessed in proper context. Use
      get_cpu_var()/put_cpu_var() pair to disable
      preemption during access.
      As the hardware authorization settings apply to all CPUs, it
      does not matter which CPU is used to check the authorization setting.
      
      Remove the event->count assignment. It is not needed as function
      perf_event_alloc() allocates memory for the event with kzalloc() and
      thus count is already set to zero.
      
      Fixes: fe5908bc ("s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace")
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b6ffdf27
    • C
      arm64: replace memblock_alloc_low with memblock_alloc · 9e0a17db
      Chen Zhou 提交于
      If we use "crashkernel=Y[@X]" and the start address is above 4G,
      the arm64 kdump capture kernel may call memblock_alloc_low() failure
      in request_standard_resources(). Replacing memblock_alloc_low() with
      memblock_alloc().
      
      [    0.000000] MEMBLOCK configuration:
      [    0.000000]  memory size = 0x0000000040650000 reserved size = 0x0000000004db7f39
      [    0.000000]  memory.cnt  = 0x6
      [    0.000000]  memory[0x0]	[0x00000000395f0000-0x000000003968ffff], 0x00000000000a0000 bytes on node 0 flags: 0x4
      [    0.000000]  memory[0x1]	[0x0000000039730000-0x000000003973ffff], 0x0000000000010000 bytes on node 0 flags: 0x4
      [    0.000000]  memory[0x2]	[0x0000000039780000-0x000000003986ffff], 0x00000000000f0000 bytes on node 0 flags: 0x4
      [    0.000000]  memory[0x3]	[0x0000000039890000-0x0000000039d0ffff], 0x0000000000480000 bytes on node 0 flags: 0x4
      [    0.000000]  memory[0x4]	[0x000000003ed00000-0x000000003ed2ffff], 0x0000000000030000 bytes on node 0 flags: 0x4
      [    0.000000]  memory[0x5]	[0x0000002040000000-0x000000207fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
      [    0.000000]  reserved.cnt  = 0x7
      [    0.000000]  reserved[0x0]	[0x0000002040080000-0x0000002041c4dfff], 0x0000000001bce000 bytes flags: 0x0
      [    0.000000]  reserved[0x1]	[0x0000002041c53000-0x0000002042c203f8], 0x0000000000fcd3f9 bytes flags: 0x0
      [    0.000000]  reserved[0x2]	[0x000000207da00000-0x000000207dbfffff], 0x0000000000200000 bytes flags: 0x0
      [    0.000000]  reserved[0x3]	[0x000000207ddef000-0x000000207fbfffff], 0x0000000001e11000 bytes flags: 0x0
      [    0.000000]  reserved[0x4]	[0x000000207fdf2b00-0x000000207fdfc03f], 0x0000000000009540 bytes flags: 0x0
      [    0.000000]  reserved[0x5]	[0x000000207fdfd000-0x000000207ffff3ff], 0x0000000000202400 bytes flags: 0x0
      [    0.000000]  reserved[0x6]	[0x000000207ffffe00-0x000000207fffffff], 0x0000000000000200 bytes flags: 0x0
      [    0.000000] Kernel panic - not syncing: request_standard_resources: Failed to allocate 384 bytes
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-next-20190321+ #4
      [    0.000000] Call trace:
      [    0.000000]  dump_backtrace+0x0/0x188
      [    0.000000]  show_stack+0x24/0x30
      [    0.000000]  dump_stack+0xa8/0xcc
      [    0.000000]  panic+0x14c/0x31c
      [    0.000000]  setup_arch+0x2b0/0x5e0
      [    0.000000]  start_kernel+0x90/0x52c
      [    0.000000] ---[ end Kernel panic - not syncing: request_standard_resources: Failed to allocate 384 bytes ]---
      
      Link: https://www.spinics.net/lists/arm-kernel/msg715293.htmlSigned-off-by: NChen Zhou <chenzhou10@huawei.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9e0a17db
  4. 27 3月, 2019 3 次提交
  5. 26 3月, 2019 2 次提交
  6. 25 3月, 2019 3 次提交
    • S
      ARM: davinci: fix build failure with allnoconfig · 2dbed152
      Sekhar Nori 提交于
      allnoconfig build with just ARCH_DAVINCI enabled
      fails because drivers/clk/davinci/* depends on
      REGMAP being enabled.
      
      Fix it by selecting REGMAP_MMIO when building in
      DaVinci support.
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Reviewed-by: NDavid Lechner <david@lechnology.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      2dbed152
    • M
      powerpc/64: Fix memcmp reading past the end of src/dest · d9470757
      Michael Ellerman 提交于
      Chandan reported that fstests' generic/026 test hit a crash:
      
        BUG: Unable to handle kernel data access at 0xc00000062ac40000
        Faulting instruction address: 0xc000000000092240
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
        CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1
        NIP:  c000000000092240 LR: c00000000066a55c CTR: 0000000000000000
        REGS: c00000062c0c3430 TRAP: 0300   Not tainted  (5.0.0-rc2-next-20190115-00001-g6de6dba64dda)
        MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000842  XER: 20000000
        CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0
        GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660
        GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4
        GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0
        GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000
        GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002
        GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000
        GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000
        NIP memcmp+0x120/0x690
        LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
        Call Trace:
          xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
          xfs_da3_node_lookup_int+0x32c/0x5a0
          xfs_attr_node_addname+0x170/0x6b0
          xfs_attr_set+0x2ac/0x340
          __xfs_set_acl+0xf0/0x230
          xfs_set_acl+0xd0/0x160
          set_posix_acl+0xc0/0x130
          posix_acl_xattr_set+0x68/0x110
          __vfs_setxattr+0xa4/0x110
          __vfs_setxattr_noperm+0xac/0x240
          vfs_setxattr+0x128/0x130
          setxattr+0x248/0x600
          path_setxattr+0x108/0x120
          sys_setxattr+0x28/0x40
          system_call+0x5c/0x70
        Instruction dump:
        7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000
        4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040
      
      The instruction dump decodes as:
        subfic  r6,r5,8
        rlwinm  r6,r6,3,0,28
        ldbrx   r9,0,r3
        ldbrx   r10,0,r4      <-
      
      Which shows us doing an 8 byte load from c00000062ac3fff9, which
      crosses the page boundary at c00000062ac40000 and faults.
      
      It's not OK for memcmp to read past the end of the source or
      destination buffers if that would cross a page boundary, because we
      don't know that the next page is mapped.
      
      As pointed out by Segher, we can read past the end of the source or
      destination as long as we don't cross a 4K boundary, because that's
      our minimum page size on all platforms.
      
      The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
      there we know that s1 is 8-byte aligned and we have at least 1 byte to
      read, so a single 8-byte load won't read past the end of s1 and cross
      a page boundary.
      
      But we have to be more careful with s2. So check if it's within 8
      bytes of a 4K boundary and if so go to the byte-by-byte loop.
      
      Fixes: 2d9ee327 ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()")
      Cc: stable@vger.kernel.org # v4.19+
      Reported-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Tested-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d9470757
    • P
      x86/resctrl: Remove unused variable · 7f2daa96
      Peng Hao 提交于
      Variable "struct rdt_resource *r" is set but not used. So remove it.
      Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1552152584-26087-1-git-send-email-peng.hao2@zte.com.cn
      7f2daa96
  7. 23 3月, 2019 2 次提交
  8. 22 3月, 2019 1 次提交
    • V
      x86/mm/pti: Make local symbols static · 4fe64a62
      Valdis Kletnieks 提交于
      With 'make C=2 W=1', sparse and gcc both complain:
      
        CHECK   arch/x86/mm/pti.c
      arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
      arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
        CC      arch/x86/mm/pti.o
      arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
        605 | void pti_set_kernel_image_nonglobal(void)
            |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
      drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
      local (static) symbol.
      
      Make both static.
      Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
      
      4fe64a62
新手
引导
客服 返回
顶部