1. 08 10月, 2016 8 次提交
    • D
      x86/pkeys: Make protection keys an "eager" feature · d4b05923
      Dave Hansen 提交于
      Our XSAVE features are divided into two categories: those that
      generate FPU exceptions, and those that do not.  MPX and pkeys do
      not generate FPU exceptions and thus can not be used lazily.  We
      disable them when lazy mode is forced on.
      
      We have a pair of masks to collect these two sets of features, but
      XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY.
      Fix it by moving the feature to XFEATURE_MASK_EAGER.
      
      Note: this only causes problem if you boot with lazy FPU mode
      (eagerfpu=off) which is *not* the default.  It also only affects
      hardware which is not currently publicly available.  It looks like
      eager mode is going away, but we still need this patch applied
      to any kernel that has protection keys and lazy mode, which is 4.6
      through 4.8 at this point, and 4.9 if the lazy removal isn't sent
      to Linus for 4.9.
      
      Fixes: c8df4009 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures")
      Signed-off-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d4b05923
    • T
      x86/apic: Prevent pointless warning messages · df610d67
      Thomas Gleixner 提交于
      Markus reported that he sees new warnings:
      
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.
      
      This comes from the recent persistant cpuid - nodeid changes. The code
      which emits the warning has been called prior to these changes only for
      enabled processors. Now it's called for disabled processors as well to get
      the possible cpu accounting correct. So if the kernel is compiled for the
      number of actual available/enabled CPUs and the BIOS reports disabled CPUs
      as well then the above warnings are printed.
      
      That's a pointless exercise as it only makes sense if there are more CPUs
      enabled than the kernel supports.
      
      Nake the warning conditional on enabled processors so we are back to the
      state before these changes.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
      Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      df610d67
    • T
      x86/acpi: Prevent LAPIC id 0xff from being accounted · f3bf1dbe
      Thomas Gleixner 提交于
      Yinghai reported that the recent changes to make the cpuid - nodeid
      relationship permanent causes a cpuid ordering regression on a system which
      has 2apic enabled..
      
      The reason is that the ACPI local APIC parser has no sanity check for
      apicid 0xff, which is an invalid id. So a CPU id for this invalid local
      APIC id is allocated and therefor breaks the cpuid ordering.
      
      Add a sanity check to acpi_parse_lapic() which ignores the invalid id.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping")
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: douly.fnst@cn.fujitsu.com,
      Cc: zhugh.fnst@cn.fujitsu.com
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Lv Zheng <lv.zheng@intel.com>,
      Cc: robert.moore@intel.com
      Cc: linux-acpi@vger.kernel.org
      Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com
      f3bf1dbe
    • C
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf 提交于
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6727ad9e
    • C
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf 提交于
      Patch series "improvements to the nmi_backtrace code" v9.
      
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      
      This patch (of 4):
      
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a01c3ed
    • V
      atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE · 51a02124
      Vineet Gupta 提交于
      This came to light when implementing native 64-bit atomics for ARCv2.
      
      The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
      to check whether atomic64_dec_if_positive() is available.  It seems it
      was needed when not every arch defined it.  However as of current code
      the Kconfig option seems needless
      
       - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
         generic definition of API is present lib/atomic64.c
       - arches with native 64-bit atomics select it in arch/*/Kconfig and
         define the API in their headers
      
      So I see no point in keeping the Kconfig option
      
      Compile tested for:
       - blackfin (CONFIG_GENERIC_ATOMIC64)
       - x86 (!CONFIG_GENERIC_ATOMIC64)
       - ia64
      
      Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.comSigned-off-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ming Lin <ming.l@ssi.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51a02124
    • Y
      mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE · 461a7184
      Yisheng Xie 提交于
      Avoid making ifdef get pretty unwieldy if many ARCHs support gigantic
      page.  No functional change with this patch.
      
      Link: http://lkml.kernel.org/r/1475227569-63446-2-git-send-email-xieyisheng1@huawei.comSigned-off-by: NYisheng Xie <xieyisheng1@huawei.com>
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      461a7184
    • B
      mm: move phys_mem_access_prot_allowed() declaration to pgtable.h · 08ea8c07
      Baoyou Xie 提交于
      We get 1 warning when building kernel with W=1:
      
        drivers/char/mem.c:220:12: warning: no previous prototype for 'phys_mem_access_prot_allowed' [-Wmissing-prototypes]
         int __weak phys_mem_access_prot_allowed(struct file *file,
      
      In fact, its declaration is spreading to several header files in
      different architecture, but need to be declare in common header file.
      
      So this patch moves phys_mem_access_prot_allowed() to pgtable.h.
      
      Link: http://lkml.kernel.org/r/1473751597-12139-1-git-send-email-baoyou.xie@linaro.orgSigned-off-by: NBaoyou Xie <baoyou.xie@linaro.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ea8c07
  2. 07 10月, 2016 1 次提交
    • P
      arch/x86: Handle non enumerated CPU after physical hotplug · 2a51fe08
      Prarit Bhargava 提交于
      When a CPU is physically added to a system then the MADT table is not
      updated.
      
      If subsequently a kdump kernel is started on that physically added CPU then
      the ACPI enumeration fails to provide the information for this CPU which is
      now the boot CPU of the kdump kernel.
      
      As a consequence, generic_processor_info() is not invoked for that CPU so
      the number of enumerated processors is 0 and none of the initializations,
      including the logical package id management, are performed.
      
      We have code which relies on the correctness of the logical package map and
      other information which is initialized via generic_processor_info().
      Executing such code will result in undefined behaviour or kernel crashes.
      
      This problem applies only to the kdump kernel because a normal kexec will
      switch to the original boot CPU, which is enumerated in MADT, before
      jumping into the kexec kernel.
      
      The boot code already has a check for num_processors equal 0 in
      prefill_possible_map(). We can use that check as an indicator that the
      enumeration of the boot CPU did not happen and invoke generic_processor_info()
      for it. That initializes the relevant data for the boot CPU and therefore
      prevents subsequent failure.
      
      [ tglx: Refined the code and rewrote the changelog ]
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Fixes: 1f12e32f ("x86/topology: Create logical package id")
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: dyoung@redhat.com
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2a51fe08
  3. 06 10月, 2016 2 次提交
    • B
      xen/x86: Update topology map for PV VCPUs · a6a198bc
      Boris Ostrovsky 提交于
      Early during boot topology_update_package_map() computes
      logical_pkg_ids for all present processors.
      
      Later, when processors are brought up, identify_cpu() updates
      these values based on phys_pkg_id which is a function of
      initial_apicid. On PV guests the latter may point to a
      non-existing node, causing logical_pkg_ids to be set to -1.
      
      Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
      to index its arrays and therefore in this case will point to index
      65535 (since logical_pkg_id is a u16). This could lead to either a
      crash or may actually access random memory location.
      
      As a workaround, we recompute topology during CPU bringup to reset
      logical_pkg_id to a valid value.
      
      (The reason for initial_apicid being bogus is because it is
      initial_apicid of the processor from which the guest is launched.
      This value is CPUID(1).EBX[31:24])
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      a6a198bc
    • J
      x86/unwind: Fix oprofile module link error · cfee9edd
      Josh Poimboeuf 提交于
      When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n,
      the oprofile module fails to link:
      
        ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined!
      
      The problem was introduced when oprofile was converted to use the new
      x86 unwinder.  When frame pointers are disabled, the "guess" unwinder's
      unwind_get_return_address() is an inline function which calls
      ftrace_graph_ret_addr(), which is not exported.
      
      Fix it by converting the "guess" version of unwind_get_return_address()
      to an exported out-of-line function, just like its frame pointer
      counterpart.
      Reported-by: NKarl Beldan <karl.beldan@gmail.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ec2ad9cc ("oprofile/x86: Convert x86_backtrace() to use the new unwinder")
      Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cfee9edd
  4. 05 10月, 2016 4 次提交
  5. 04 10月, 2016 1 次提交
    • M
      x86/irq: Prevent force migration of irqs which are not in the vector domain · db91aa79
      Mika Westerberg 提交于
      When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
      affinities related to the CPU in question. The same thing is also done when
      the system is suspended to S-states like S3 (mem).
      
      For each IRQ we try to complete any on-going move regardless whether the
      IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
      its chip_data, assume it is of type struct apic_chip_data and manipulate it
      by clearing old_domain mask etc. For irq_chips that are not part of the
      x86_vector_domain, like those created by various GPIO drivers, will find
      their chip_data being changed unexpectly.
      
      Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
      corrupted after resume:
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  hi
      
        # rtcwake -s10 -mmem
        <10 seconds passes>
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  ?
      
      Note '?' in the output. It means the struct gpio_chip ->get function is
      NULL whereas before suspend it was there.
      
      Fix this by first checking that the IRQ belongs to x86_vector_domain before
      we try to use the chip_data as struct apic_chip_data.
      Reported-and-tested-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: stable@vger.kernel.org # 4.4+
      Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      db91aa79
  6. 30 9月, 2016 12 次提交
  7. 29 9月, 2016 1 次提交
  8. 27 9月, 2016 1 次提交
  9. 26 9月, 2016 2 次提交
  10. 25 9月, 2016 1 次提交
  11. 24 9月, 2016 1 次提交
  12. 23 9月, 2016 6 次提交
    • K
      x86/PCI: VMD: Request userspace control of PCIe hotplug indicators · 3161832d
      Keith Busch 提交于
      Add set_dev_domain_options() to set PCI domain-specific options as devices
      are added.  The first usage is to request exclusive userspace control of
      PCIe hotplug indicators in VMD domains.
      
      Devices in a VMD domain use PCIe hotplug Attention and Power Indicators in
      a non-standard way; tell pciehp to ignore the indicators so userspace can
      control them via the sysfs "attention" file.
      
      To determine whether a bus is within a VMD domain, add a bool to the
      pci_sysdata structure that the VMD driver sets during initialization.
      
      [bhelgaas: changelog]
      Requested-by: NKapil Karkra <kapil.karkra@intel.com>
      Tested-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      3161832d
    • W
      KVM: nVMX: Fix the NMI IDT-vectoring handling · c5a6d5f7
      Wanpeng Li 提交于
      Run kvm-unit-tests/eventinj.flat in L1:
      
      Sending NMI to self
      After NMI to self
      FAIL: NMI
      
      This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly.
      
      At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1
      and L0 are empty so:
      
      - The L2 accesses memory can generate EPT violation which can be intercepted by L0.
      
        The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is
        recorded in vmcs02's IDT-vectoring info.
      
      - L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1.
      
        The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since
        it is a nested vmexit.
      
      - L1 receives the EPT violation, then fixes its EPT12.
      - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0.
      - L0 emulates VMRESUME which is called from L1, then return to L2.
      
        L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through
        vmcs02.
      
      - The L2 re-executes the fault instruction and cause EPT violation again.
      - Since the L1's EPT12 is valid, L0 can fix its EPT02
      - L0 resume L2
      
        The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info
        is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry
        event injection since it is caused by EPT02's EPT violation.
      
      However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in
      guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is
      the L0's responsibility to inject NMI from IDT-vectoring info to L2.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Bandan Das <bsd@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c5a6d5f7
    • W
      KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive · f6e90f9e
      Wanpeng Li 提交于
      I observed that kvmvapic(to optimize flexpriority=N or AMD) is used
      to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case
      on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f
      x86, apicv: add virtual x2apic support) disable virtual x2apic mode
      completely if w/o APICv, and the author also told me that windows guest
      can't enter into x2apic mode when he developed the APICv feature several
      years ago. However, it is not truth currently, Interrupt Remapping and
      vIOMMU is added to qemu and the developers from Intel test windows 8 can
      work in x2apic mode w/ Interrupt Remapping enabled recently.
      
      This patch enables TPR shadow for virtual x2apic mode to boost
      windows guest in x2apic mode even if w/o APICv.
      
      Can pass the kvm-unit-test.
      Suggested-by: NRadim Krčmář <rkrcmar@redhat.com>
      Suggested-by: NWincy Van <fanwenyi0529@gmail.com>
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wincy Van <fanwenyi0529@gmail.com>
      Cc: Yang Zhang <yang.zhang.wz@gmail.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f6e90f9e
    • W
      KVM: nVMX: Fix reload apic access page warning · c83b6d15
      Wanpeng Li 提交于
      WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
      CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
      Call Trace:
       dump_stack+0x99/0xd0
       __warn+0xd1/0xf0
       warn_slowpath_fmt+0x4f/0x60
       ? prepare_to_swait+0x39/0xa0
       ? prepare_to_swait+0x39/0xa0
       __might_sleep+0x7e/0x80
       __gfn_to_pfn_memslot+0x156/0x480 [kvm]
       gfn_to_pfn+0x2a/0x30 [kvm]
       gfn_to_page+0xe/0x20 [kvm]
       kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
       nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
       ? _raw_spin_unlock_irqrestore+0x36/0x80
       vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
       kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
       kvm_vcpu_check_block+0x12/0x60 [kvm]
       kvm_vcpu_block+0x94/0x4c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
       ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
       kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.8.0-rc5+ #47 Not tainted
      -------------------------------
      ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-x86/4230:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
      
      stack backtrace:
      CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
      Call Trace:
       dump_stack+0x99/0xd0
       lockdep_rcu_suspicious+0xe7/0x120
       gfn_to_memslot+0x12a/0x140 [kvm]
       gfn_to_pfn+0x12/0x30 [kvm]
       gfn_to_page+0xe/0x20 [kvm]
       kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
       nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
       ? _raw_spin_unlock_irqrestore+0x36/0x80
       vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
       kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
       kvm_vcpu_check_block+0x12/0x60 [kvm]
       kvm_vcpu_block+0x94/0x4c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
       ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
       kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
       ? __fget+0xfd/0x210
       ? __lock_is_held+0x54/0x70
       do_vfs_ioctl+0x96/0x6a0
       ? __fget+0x11c/0x210
       ? __fget+0x5/0x210
       SyS_ioctl+0x79/0x90
       do_syscall_64+0x81/0x220
       entry_SYSCALL64_slow_path+0x25/0x25
      
      These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
      
      The nested preemption timer is based on hrtimer which is started on L2
      entry, stopped on L2 exit and evaluated via the new check_nested_events
      hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
      if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
      both can be in nested preemption timer evaluation path which results in
      the warning above.
      
      This patch fix it by leveraging request bit to async reload APIC access
      page before vmentry in order to avoid to reload directly during the nested
      preemption timer evaluation, it is safe since the vmcs01 is loaded and
      current is nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c83b6d15
    • R
      config: move x86 kvm_guest.config to a common location · bd6c9222
      Rob Herring 提交于
      kvm_guest.config is useful for KVM guests on other arches, and nothing
      in it appears to be x86 specific, so just move the whole file. Kbuild
      will find it in either location.
      Signed-off-by: NRob Herring <robh@kernel.org>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Cc: kvm@vger.kernel.org
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bd6c9222
    • V
      x86/platform/mellanox: Introduce support for Mellanox systems platform · 58cbbee2
      Vadim Pasternak 提交于
      Enable system support for the Mellanox Technologies platform, which
      provides support for the next Mellanox basic systems: "msx6710",
      "msx6720", "msb7700", "msn2700", "msx1410", "msn2410", "msb7800",
      "msn2740", "msn2100" and also various number of derivative systems from
      the above basic types.
      
      The Kconfig controlling compilation of this code is: MLX_PLATFORM
      Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
      Cc: jiri@resnulli.us
      Cc: gregkh@linuxfoundation.org
      Cc: platform-driver-x86@vger.kernel.org
      Cc: geert@linux-m68k.org
      Cc: linux@roeck-us.net
      Cc: akpm@linux-foundation.org
      Cc: mchehab@kernel.org
      Cc: davem@davemloft.net
      Cc: kvalo@codeaurora.org
      Link: http://lkml.kernel.org/r/1474578822-33805-1-git-send-email-vadimp@mellanox.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      58cbbee2