1. 10 6月, 2020 6 次提交
    • M
      mm: consolidate pte_index() and pte_offset_*() definitions · 974b9b2c
      Mike Rapoport 提交于
      All architectures define pte_index() as
      
      	(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)
      
      and all architectures define pte_offset_kernel() as an entry in the array
      of PTEs indexed by the pte_index().
      
      For the most architectures the pte_offset_kernel() implementation relies
      on the availability of pmd_page_vaddr() that converts a PMD entry value to
      the virtual address of the page containing PTEs array.
      
      Let's move x86 definitions of the PTE accessors to the generic place in
      <linux/pgtable.h> and then simply drop the respective definitions from the
      other architectures.
      
      The architectures that didn't provide pmd_page_vaddr() are updated to have
      that defined.
      
      The generic implementation of pte_offset_kernel() can be overridden by an
      architecture and alpha makes use of this because it has special ordering
      requirements for its version of pte_offset_kernel().
      
      [rppt@linux.ibm.com: v2]
        Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
      [rppt@linux.ibm.com: update]
        Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
      [rppt@linux.ibm.com: update]
        Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
      [akpm@linux-foundation.org: fix x86 warning]
      [sfr@canb.auug.org.au: fix powerpc build]
        Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      974b9b2c
    • M
      x86/mm: simplify init_trampoline() and surrounding logic · 88107d33
      Mike Rapoport 提交于
      There are three cases for the trampoline initialization:
      * 32-bit does nothing
      * 64-bit with kaslr disabled simply copies a PGD entry from the direct map
        to the trampoline PGD
      * 64-bit with kaslr enabled maps the real mode trampoline at PUD level
      
      These cases are currently differentiated by a bunch of ifdefs inside
      asm/include/pgtable.h and the case of 64-bits with kaslr on uses
      pgd_index() helper.
      
      Replacing the ifdefs with a static function in arch/x86/mm/init.c gives
      clearer code and allows moving pgd_index() to the generic implementation
      in include/linux/pgtable.h
      
      [rppt@linux.ibm.com: take CONFIG_RANDOMIZE_MEMORY into account in kaslr_enabled()]
        Link: http://lkml.kernel.org/r/20200525104045.GB13212@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-8-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88107d33
    • M
      mm: reorder includes after introduction of linux/pgtable.h · 65fddcfc
      Mike Rapoport 提交于
      The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
      of the latter in the middle of asm includes.  Fix this up with the aid of
      the below script and manual adjustments here and there.
      
      	import sys
      	import re
      
      	if len(sys.argv) is not 3:
      	    print "USAGE: %s <file> <header>" % (sys.argv[0])
      	    sys.exit(1)
      
      	hdr_to_move="#include <linux/%s>" % sys.argv[2]
      	moved = False
      	in_hdrs = False
      
      	with open(sys.argv[1], "r") as f:
      	    lines = f.readlines()
      	    for _line in lines:
      		line = _line.rstrip('
      ')
      		if line == hdr_to_move:
      		    continue
      		if line.startswith("#include <linux/"):
      		    in_hdrs = True
      		elif not moved and in_hdrs:
      		    moved = True
      		    print hdr_to_move
      		print line
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65fddcfc
    • M
      mm: introduce include/linux/pgtable.h · ca5999fd
      Mike Rapoport 提交于
      The include/linux/pgtable.h is going to be the home of generic page table
      manipulation functions.
      
      Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
      make the latter include asm/pgtable.h.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca5999fd
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
    • D
      x86: add missing const qualifiers for log_lvl · d46b3df7
      Dmitry Safonov 提交于
      Currently, the log-level of show_stack() depends on a platform
      realization.  It creates situations where the headers are printed with
      lower log level or higher than the stacktrace (depending on a platform or
      user).
      
      Furthermore, it forces the logic decision from user to an architecture
      side.  In result, some users as sysrq/kdb/etc are doing tricks with
      temporary rising console_loglevel while printing their messages.  And in
      result it not only may print unwanted messages from other CPUs, but also
      omit printing at all in the unlucky case where the printk() was deferred.
      
      Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier
      approach than introducing more printk buffers.  Also, it will consolidate
      printings with headers.
      
      Keep log_lvl const show_trace_log_lvl() and printk_stack_address() as the
      new generic show_stack_loglvl() wants to have a proper const qualifier.
      
      And gcc rightfully produces warnings in case it's not keept:
      arch/x86/kernel/dumpstack.c: In function `show_stack':
      arch/x86/kernel/dumpstack.c:294:37: warning: passing argument 4 of `show_trace_log_lv ' discards `const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        294 |  show_trace_log_lvl(task, NULL, sp, loglvl);
            |                                     ^~~~~~
      arch/x86/kernel/dumpstack.c:163:32: note: expected `char *' but argument is of type `const char *'
        163 |    unsigned long *stack, char *log_lvl)
            |                          ~~~~~~^~~~~~~
      
      [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/T/#uSigned-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20200418201944.482088-41-dima@arista.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d46b3df7
  2. 09 6月, 2020 1 次提交
  3. 05 6月, 2020 9 次提交
  4. 04 6月, 2020 3 次提交
  5. 03 6月, 2020 6 次提交
  6. 01 6月, 2020 10 次提交
    • J
      x86/kvm/hyper-v: Add support for synthetic debugger interface · f97f5a56
      Jon Doron 提交于
      Add support for Hyper-V synthetic debugger (syndbg) interface.
      The syndbg interface is using MSRs to emulate a way to send/recv packets
      data.
      
      The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
      and if it supports the synthetic debugger interface it will attempt to
      use it, instead of trying to initialize a network adapter.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f97f5a56
    • J
      x86/hyper-v: Add synthetic debugger definitions · 22ad0026
      Jon Doron 提交于
      Hyper-V synthetic debugger has two modes, one that uses MSRs and
      the other that use Hypercalls.
      
      Add all the required definitions to both types of synthetic debugger
      interface.
      
      Some of the required new CPUIDs and MSRs are not documented in the TLFS
      so they are in hyperv.h instead.
      
      The reason they are not documented is because they are subjected to be
      removed in future versions of Windows.
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NJon Doron <arilou@gmail.com>
      Message-Id: <20200529134543.1127440-3-arilou@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      22ad0026
    • L
      KVM: x86/pmu: Support full width counting · 27461da3
      Like Xu 提交于
      Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0)
      for GP counters that allows writing the full counter width. Enable this
      range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]).
      
      The guest would query CPUID to get the counter width, and sign extends
      the counter values as needed. The traditional MSRs always limit to 32bit,
      even though the counter internally is larger (48 or 57 bits).
      
      When the new capability is set, use the alternative range which do not
      have these restrictions. This lowers the overhead of perf stat slightly
      because it has to do less interrupts to accumulate the counter value.
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      27461da3
    • V
      KVM: x86: acknowledgment mechanism for async pf page ready notifications · 557a961a
      Vitaly Kuznetsov 提交于
      If two page ready notifications happen back to back the second one is not
      delivered and the only mechanism we currently have is
      kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
      only be performed with the next vmexit when it happens and in some cases
      it may take a while. With interrupt based page ready notification delivery
      the situation is even worse: unlike exceptions, interrupts are not handled
      immediately so we must check if the slot is empty. This is slow and
      unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
      the fact that the slot is free and host should check its notification
      queue. Mandate using it for interrupt based 'page ready' APF event
      delivery.
      
      As kvm_check_async_pf_completion() is going away from vcpu_run() we need
      a way to communicate the fact that vcpu->async_pf.done queue has
      transitioned from empty to non-empty state. Introduce
      kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-7-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      557a961a
    • V
      KVM: x86: interrupt based APF 'page ready' event delivery · 2635b5c4
      Vitaly Kuznetsov 提交于
      Concerns were expressed around APF delivery via synthetic #PF exception as
      in some cases such delivery may collide with real page fault. For 'page
      ready' notifications we can easily switch to using an interrupt instead.
      Introduce new MSR_KVM_ASYNC_PF_INT mechanism and deprecate the legacy one.
      
      One notable difference between the two mechanisms is that interrupt may not
      get handled immediately so whenever we would like to deliver next event
      (regardless of its type) we must be sure the guest had read and cleared
      previous event in the slot.
      
      While on it, get rid on 'type 1/type 2' names for APF events in the
      documentation as they are causing confusion. Use 'page not present'
      and 'page ready' everywhere instead.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-6-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2635b5c4
    • V
      KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() · 7c0ade6c
      Vitaly Kuznetsov 提交于
      An innocent reader of the following x86 KVM code:
      
      bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
      {
              if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
                      return true;
      ...
      
      may get very confused: if APF mechanism is not enabled, why do we report
      that we 'can inject async page present'? In reality, upon injection
      kvm_arch_async_page_present() will check the same condition again and,
      in case APF is disabled, will just drop the item. This is fine as the
      guest which deliberately disabled APF doesn't expect to get any APF
      notifications.
      
      Rename kvm_arch_can_inject_async_page_present() to
      kvm_arch_can_dequeue_async_page_present() to make it clear what we are
      checking: if the item can be dequeued (meaning either injected or just
      dropped).
      
      On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so
      the rename doesn't matter much.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-4-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7c0ade6c
    • V
      KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info · 68fd66f1
      Vitaly Kuznetsov 提交于
      Currently, APF mechanism relies on the #PF abuse where the token is being
      passed through CR2. If we switch to using interrupts to deliver page-ready
      notifications we need a different way to pass the data. Extent the existing
      'struct kvm_vcpu_pv_apf_data' with token information for page-ready
      notifications.
      
      While on it, rename 'reason' to 'flags'. This doesn't change the semantics
      as we only have reasons '1' and '2' and these can be treated as bit flags
      but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
      making 'reason' name misleading.
      
      The newly introduced apf_put_user_ready() temporary puts both flags and
      token information, this will be changed to put token only when we switch
      to interrupt based notifications.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      68fd66f1
    • P
      KVM: nSVM: remove HF_HIF_MASK · 08245e6d
      Paolo Bonzini 提交于
      The L1 flags can be found in the save area of svm->nested.hsave, fish
      it from there so that there is one fewer thing to migrate.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      08245e6d
    • P
      KVM: nSVM: remove HF_VINTR_MASK · e9fd761a
      Paolo Bonzini 提交于
      Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can
      use it instead of vcpu->arch.hflags to check whether L2 is running
      in V_INTR_MASKING mode.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e9fd761a
    • P
      KVM: nSVM: remove trailing padding for struct vmcb_control_area · 7923ef4f
      Paolo Bonzini 提交于
      Allow placing the VMCB structs on the stack or in other structs without
      wasting too much space.  Add BUILD_BUG_ON as a quick safeguard against typos.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7923ef4f
  7. 30 5月, 2020 2 次提交
  8. 29 5月, 2020 3 次提交
    • J
      x86/ioperm: Prevent a memory leak when fork fails · 4bfe6cce
      Jay Lang 提交于
      In the copy_process() routine called by _do_fork(), failure to allocate
      a PID (or further along in the function) will trigger an invocation to
      exit_thread(). This is done to clean up from an earlier call to
      copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
      however during the process, io_bitmap_exit() nullifies the parent's
      io_bitmap rather than the child's.
      
      As copy_thread_tls() has been called ahead of the failure, the reference
      count on the calling thread's io_bitmap is incremented as we would expect.
      However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
      it should trash the current thread's io_bitmap reference rather than the
      child's. This is pretty sneaky in practice, because in all instances but
      this one, exit_thread() is called with respect to the current task and
      everything works out.
      
      A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
      get a bitmap allocated, and force a clone3() syscall to fail by passing
      in a zeroed clone_args structure. The kernel handles the erroneous struct
      and the buggy code path is followed, and even though the parent's reference
      to the io_bitmap is trashed, the child still holds a reference and thus
      the structure will never be freed.
      
      Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
      task_struct argument which to operate on.
      
      Fixes: ea5f1cd7 ("x86/ioperm: Remove bitmap if all permissions dropped")
      Signed-off-by: NJay Lang <jaytlang@mit.edu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable#@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
      4bfe6cce
    • W
      x86/spinlock: Remove obsolete ticket spinlock macros and types · 2ca41f55
      Waiman Long 提交于
      Even though the x86 ticket spinlock code has been removed with
      
        cfd8983f ("x86, locking/spinlocks: Remove ticket (spin)lock implementation")
      
      a while ago, there are still some ticket spinlock specific macros and
      types left in the asm/spinlock_types.h header file that are no longer
      used. Remove those as well to avoid confusion.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200526122014.25241-1-longman@redhat.com
      2ca41f55
    • A
      x86/dma: Fix max PFN arithmetic overflow on 32 bit systems · 88743470
      Alexander Dahl 提交于
      The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
      4 294 967 296 or 0x100000000 which is no problem on 64 bit systems.
      The patch does not change the later overall result of 0x100000 for
      MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new
      calculation yields the same result, but does not require 64 bit
      arithmetic.
      
      On 32 bit systems the old calculation suffers from an arithmetic
      overflow in that intermediate term in braces: 4UL aka unsigned long int
      is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does
      not fit in 4 bytes), the in braces result is truncated to zero, the
      following right shift does not alter that, so MAX_DMA32_PFN evaluates to
      0 on 32 bit systems.
      
      That wrong value is a problem in a comparision against MAX_DMA32_PFN in
      the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if
      swiotlb should be active.  That comparison yields the opposite result,
      when compiling on 32 bit systems.
      
      This was not possible before
      
        1b7e03ef ("x86, NUMA: Enable emulation on 32bit too")
      
      when that MAX_DMA32_PFN was first made visible to x86_32 (and which
      landed in v3.0).
      
      In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on
      x86-32.
      
      However if one has set CONFIG_IOMMU_INTEL, since
      
        c5a5dc4c ("iommu/vt-d: Don't switch off swiotlb if bounce page is used")
      
      there's a dependency on CONFIG_SWIOTLB, which was not necessarily
      active before. That landed in v5.4, where we noticed it in the fli4l
      Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64
      bit kernel configs there (I could not find out why, so let's just say
      historical reasons).
      
      The effect is at boot time 64 MiB (default size) were allocated for
      bounce buffers now, which is a noticeable amount of memory on small
      systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
      frequently used as home routers.
      
      We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
      (LTS) in fli4l and got that kernel messages for example:
      
        Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018
        …
        Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
        …
        PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
        software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)
      
      The initial analysis and the suggested fix was done by user 'sourcejedi'
      at stackoverflow and explicitly marked as GPLv2 for inclusion in the
      Linux kernel:
      
        https://unix.stackexchange.com/a/520525/50007
      
      The new calculation, which does not suffer from that overflow, is the
      same as for arch/mips now as suggested by Robin Murphy.
      
      The fix was tested by fli4l users on round about two dozen different
      systems, including both 32 and 64 bit archs, bare metal and virtualized
      machines.
      
       [ bp: Massage commit message. ]
      
      Fixes: 1b7e03ef ("x86, NUMA: Enable emulation on 32bit too")
      Reported-by: NAlan Jenkins <alan.christopher.jenkins@gmail.com>
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NAlexander Dahl <post@lespocky.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: https://unix.stackexchange.com/q/520065/50007
      Link: https://web.nettworks.org/bugs/browse/FFL-2560
      Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
      88743470