1. 02 11月, 2017 1 次提交
    • R
      x86/mm: Relocate page fault error codes to traps.h · 1067f030
      Ricardo Neri 提交于
      Up to this point, only fault.c used the definitions of the page fault error
      codes. Thus, it made sense to keep them within such file. Other portions of
      code might be interested in those definitions too. For instance, the User-
      Mode Instruction Prevention emulation code will use such definitions to
      emulate a page fault when it is unable to successfully copy the results
      of the emulated instructions to user space.
      
      While relocating the error code enumeration, the prefix X86_ is used to
      make it consistent with the rest of the definitions in traps.h. Of course,
      code using the enumeration had to be updated as well. No functional changes
      were performed.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: ricardo.neri@intel.com
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
      1067f030
  2. 27 10月, 2017 3 次提交
    • I
      Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses" · 90edaac6
      Ingo Molnar 提交于
      This reverts commit ce56a86e.
      
      There's unanticipated interaction with some boot parameters like 'mem=',
      which now cause the new checks via valid_mmap_phys_addr_range() to be too
      restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
      
      So while the motivation of the change is still entirely valid, we
      need a few more rounds of testing to get it right - it's way too late
      after -rc6, so revert it for now.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90edaac6
    • S
      arm/xen: don't inclide rwlock.h directly. · a494ee6c
      Sebastian Andrzej Siewior 提交于
      rwlock.h should not be included directly. Instead linux/splinlock.h
      should be included. One thing it does is to break the RT build.
      
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: xen-devel@lists.xenproject.org
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NStefano Stabellini <sstabellini@kernel.org>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      a494ee6c
    • L
      alpha/PCI: Move pci_map_irq()/pci_swizzle() out of initdata · 814eae59
      Lorenzo Pieralisi 提交于
      The introduction of {map/swizzle}_irq() hooks in the struct pci_host_bridge
      allowed to replace the pci_fixup_irqs() PCI IRQ allocation in alpha arch
      PCI code with per-bridge map/swizzle functions with commit 0e4c2eeb
      ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping
      hooks").
      
      As a side effect of converting PCI IRQ allocation to the struct
      pci_host_bridge {map/swizzle}_irq() hooks mechanism, the actual PCI IRQ
      allocation function (ie pci_assign_irq()) is carried out per-device in
      pci_device_probe() that is called when a PCI device driver is about to be
      probed.
      
      This means that, for drivers compiled as loadable modules, the actual PCI
      device IRQ allocation can now happen after the system has booted so the
      struct pci_host_bridge {map/swizzle}_irq() hooks pci_assign_irq() relies on
      must stay valid after the system has booted so that PCI core can carry out
      PCI IRQ allocation correctly.
      
      Most of the alpha board structures pci_map_irq() and pci_swizzle() hooks
      (that are used to initialize their struct pci_host_bridge equivalent
      through the alpha_mv global variable - that represents the struct
      alpha_machine_vector of the running kernel) are marked as
      __init/__initdata; this causes freed memory dereferences when PCI IRQ
      allocation is carried out after the kernel has booted (ie when loading PCI
      drivers as loadable module) because when the kernel tries to bind the PCI
      device to its (module) driver, the function pci_assign_irq() is called,
      that in turn retrieves the struct pci_host_bridge {map/swizzle}_irq() hooks
      to carry out PCI IRQ allocation; if those hooks are marked as __init
      code/__initdata they point at freed/invalid memory.
      
      Fix the issue by removing the __init/__initdata markers from all subarch
      struct alpha_machine_vector.pci_map_irq()/pci_swizzle() functions (and
      data).
      
      Fixes: 0e4c2eeb ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks")
      Link: http://lkml.kernel.org/r/alpine.LRH.2.21.1710251043170.7098@math.ut.eeReported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Meelis Roos <mroos@linux.ee>
      Cc: Matt Turner <mattst88@gmail.com>
      814eae59
  3. 25 10月, 2017 1 次提交
    • M
      s390/kvm: fix detection of guest machine checks · 0a5e2ec2
      Martin Schwidefsky 提交于
      The new detection code for guest machine checks added a check based
      on %r11 to .Lcleanup_sie to distinguish between normal asynchronous
      interrupts and machine checks. But the funtion is called from the
      program check handler as well with an undefined value in %r11.
      
      The effect is that all program exceptions pointing to the SIE instruction
      will set the CIF_MCCK_GUEST bit. The bit stays set for the CPU until the
       next machine check comes in which will incorrectly be interpreted as a
      guest machine check.
      
      The simplest fix is to stop using .Lcleanup_sie in the program check
      handler and duplicate a few instructions.
      
      Fixes: c929500d ("s390/nmi: s390: New low level handling for machine check happening in guest")
      Cc: <stable@vger.kernel.org> # v4.13+
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      0a5e2ec2
  4. 24 10月, 2017 1 次提交
  5. 23 10月, 2017 2 次提交
  6. 22 10月, 2017 1 次提交
  7. 20 10月, 2017 1 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  8. 19 10月, 2017 5 次提交
  9. 18 10月, 2017 6 次提交
  10. 17 10月, 2017 1 次提交
  11. 16 10月, 2017 4 次提交
  12. 14 10月, 2017 6 次提交
    • A
      ARM: dts: imx7d: Invert legacy PCI irq mapping · 1c86c9dd
      Andrey Smirnov 提交于
      According to i.MX7D reference manual (Rev. 0.1, table 7-1, page 1221)
      legacy PCI interrupt mapping is as follows:
      
       - PCIE INT A is IRQ 122
       - PCIE INT B is IRQ 123
       - PCIE INT C is IRQ 124
       - PCIE INT D is IRQ 125
      
      Invert the mapping information in corresponding DT node to reflect
      that.
      
      Cc: yurovsky@gmail.com
      Cc: Fabio Estevam <fabio.estevam@nxp.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com>
      Fixes: a816d575 ("ARM: dts: imx7d: Add node for PCIe controller")
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      1c86c9dd
    • B
      x86/microcode: Do the family check first · 1f161f67
      Borislav Petkov 提交于
      On CPUs like AMD's Geode, for example, we shouldn't even try to load
      microcode because they do not support the modern microcode loading
      interface.
      
      However, we do the family check *after* the other checks whether the
      loader has been disabled on the command line or whether we're running in
      a guest.
      
      So move the family checks first in order to exit early if we're being
      loaded on an unsupported family.
      Reported-and-tested-by: NSven Glodowski <glodi1@arcor.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.11..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
      Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f161f67
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
    • A
      KVM: PPC: Book3S: Protect kvmppc_gpa_to_ua() with SRCU · 8f6a9f0d
      Alexey Kardashevskiy 提交于
      kvmppc_gpa_to_ua() accesses KVM memory slot array via
      srcu_dereference_check() and this produces warnings from RCU like below.
      
      This extends the existing srcu_read_lock/unlock to cover that
      kvmppc_gpa_to_ua() as well.
      
      We did not hit this before as this lock is not needed for the realmode
      handlers and hash guests would use the realmode path all the time;
      however the radix guests are always redirected to the virtual mode
      handlers and hence the warning.
      
      [   68.253798] ./include/linux/kvm_host.h:575 suspicious rcu_dereference_check() usage!
      [   68.253799]
                     other info that might help us debug this:
      
      [   68.253802]
                     rcu_scheduler_active = 2, debug_locks = 1
      [   68.253804] 1 lock held by qemu-system-ppc/6413:
      [   68.253806]  #0:  (&vcpu->mutex){+.+.}, at: [<c00800000e3c22f4>] vcpu_load+0x3c/0xc0 [kvm]
      [   68.253826]
                     stack backtrace:
      [   68.253830] CPU: 92 PID: 6413 Comm: qemu-system-ppc Tainted: G        W       4.14.0-rc3-00553-g432dcba58e9c-dirty #72
      [   68.253833] Call Trace:
      [   68.253839] [c000000fd3d9f790] [c000000000b7fcc8] dump_stack+0xe8/0x160 (unreliable)
      [   68.253845] [c000000fd3d9f7d0] [c0000000001924c0] lockdep_rcu_suspicious+0x110/0x180
      [   68.253851] [c000000fd3d9f850] [c0000000000e825c] kvmppc_gpa_to_ua+0x26c/0x2b0
      [   68.253858] [c000000fd3d9f8b0] [c00800000e3e1984] kvmppc_h_put_tce+0x12c/0x2a0 [kvm]
      
      Fixes: 121f80ba ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8f6a9f0d
    • N
      KVM: PPC: Book3S HV: POWER9 more doorbell fixes · 2cde3716
      Nicholas Piggin 提交于
      - Add another case where msgsync is required.
      - Required barrier sequence for global doorbells is msgsync ; lwsync
      
      When msgsnd is used for IPIs to other cores, msgsync must be executed by
      the target to order stores performed on the source before its msgsnd
      (provided the source executes the appropriate sync).
      
      Fixes: 1704a81c ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9")
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      2cde3716
    • G
      KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM · ac64115a
      Greg Kurz 提交于
      The following program causes a kernel oops:
      
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <sys/ioctl.h>
      #include <linux/kvm.h>
      
      main()
      {
          int fd = open("/dev/kvm", O_RDWR);
          ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM);
      }
      
      This happens because when using the global KVM fd with
      KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets
      called with a NULL kvm argument, which gets dereferenced
      in is_kvmppc_hv_enabled(). Spotted while reading the code.
      
      Let's use the hv_enabled fallback variable, like everywhere
      else in this function.
      
      Fixes: 23528bb2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ac64115a
  13. 13 10月, 2017 1 次提交
    • A
      powerpc/perf: Fix IMC initialization crash · 0d8ba162
      Anju T Sudhakar 提交于
      Panic observed with latest firmware, and upstream kernel:
      
       NIP init_imc_pmu+0x8c/0xcf0
       LR  init_imc_pmu+0x2f8/0xcf0
       Call Trace:
         init_imc_pmu+0x2c8/0xcf0 (unreliable)
         opal_imc_counters_probe+0x300/0x400
         platform_drv_probe+0x64/0x110
         driver_probe_device+0x3d8/0x580
         __driver_attach+0x14c/0x1a0
         bus_for_each_dev+0x8c/0xf0
         driver_attach+0x34/0x50
         bus_add_driver+0x298/0x350
         driver_register+0x9c/0x180
         __platform_driver_register+0x5c/0x70
         opal_imc_driver_init+0x2c/0x40
         do_one_initcall+0x64/0x1d0
         kernel_init_freeable+0x280/0x374
         kernel_init+0x24/0x160
         ret_from_kernel_thread+0x5c/0x74
      
      While registering nest imc at init, cpu-hotplug callback
      nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
      the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
      memory and cpuhotplug setup.
      
      But when cleaning up the attribute group, we are dereferencing the
      attribute element array without checking whether the backing element
      is not NULL. This causes the kernel panic.
      
      Add a check for the backing element prior to dereferencing the
      attribute element, to handle the failing case gracefully.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      [mpe: Trim change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d8ba162
  14. 12 10月, 2017 7 次提交
    • L
      x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping · 616dd587
      Len Brown 提交于
      SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode
      version number than stepping-4.  Linux needs to know this stepping-3
      specific version number to also enable the TSC_DEADLINE on stepping-3.
      
      The steppings and ucode versions are documented in the SKX BIOS update:
      https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txtSigned-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com
      616dd587
    • P
      x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors · cc6afe22
      Paolo Bonzini 提交于
      Commit 594a30fb ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled
      due to Errata" on CPUs without the feature", 2017-08-30) was also about
      silencing the warning on VirtualBox; however, KVM does expose the TSC
      deadline timer, and it's virtualized so that it is immune from CPU errata.
      
      Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs:
      
           [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                          please update microcode to version: 0xb2 (or later)
      
      Even if you had a hypervisor that does _not_ virtualize the TSC deadline
      and rather exposes the hardware one, it should be the hypervisors task
      to update microcode and possibly hide the flag from CPUID.  So just
      hide the message when running on _any_ hypervisor, not just those that
      do not support the TSC deadline timer.
      
      The older check still makes sense, so keep it.
      
      Fixes: bd9240a1 ("x86/apic: Add TSC_DEADLINE quirk due to errata")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com
      cc6afe22
    • J
      crypto: x86/chacha20 - satisfy stack validation 2.0 · 4635742d
      Jason A. Donenfeld 提交于
      The new stack validator in objdump doesn't like directly assigning r11
      to rsp, warning with something like:
      
      warning: objtool: chacha20_4block_xor_ssse3()+0xa: unsupported stack pointer realignment
      warning: objtool: chacha20_8block_xor_avx2()+0x6: unsupported stack pointer realignment
      
      This fixes things up to use code similar to gcc's DRAP register, so that
      objdump remains happy.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Fixes: baa41469 ("objtool: Implement stack validation 2.0")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4635742d
    • A
      powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() · cd4f2b30
      Anju T Sudhakar 提交于
      Stack trace output during a stress test:
       [    4.310049] Freeing initrd memory: 22592K
      [    4.310646] rtas_flash: no firmware flash support
      [    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
      [    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
      [    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
      [    4.313588] Call Trace:
      [    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
      [    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
      [    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
      [    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
      [    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
      [    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
      [    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
      [    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
      [    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
      [    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
      [    4.314313] Mem-Info:
      [    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0
      
      core_imc_mem_init() at system boot use alloc_pages_node() to get memory
      and alloc_pages_node() throws this stack dump when tried to allocate
      memory from a node which has no memory behind it. Add a ___GFP_NOWARN
      flag in allocation request as a fix.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NVenkat R.B <venkatb3@in.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd4f2b30
    • A
      powerpc/perf: Fix for core/nest imc call trace on cpuhotplug · 0d923820
      Anju T Sudhakar 提交于
      Nest/core pmu units are enabled only when it is used. A reference count is
      maintained for the events which uses the nest/core pmu units. Currently in
      *_imc_counters_release function a WARN() is used for notification of any
      underflow of ref count.
      
      The case where event ref count hit a negative value is, when perf session is
      started, followed by offlining of all cpus in a given core.
      i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
      ref->count to zero, if the current cpu which is about to offline is the last
      cpu in a given core and make an OPAL call to disable the engine in that core.
      And on perf session termination, perf->destroy (core_imc_counters_release) will
      first decrement the ref->count for this core and based on the ref->count value
      an opal call is made to disable the core-imc engine.
      Now, since cpuhotplug path already clears the ref->count for core and disabled
      the engine, perf->destroy() decrementing again at event termination make it
      negative which in turn fires the WARN_ON. The same happens for nest units.
      
      Add a check to see if the reference count is alreday zero, before decrementing
      the count, so that the ref count will not hit a negative value.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d923820
    • H
      KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit · 8eb3f87d
      Haozhong Zhang 提交于
      When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
      guest CR4. Before this CR4 loading, the guest CR4 refers to L2
      CR4. Because these two CR4's are in different levels of guest, we
      should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
      is used to handle guest writes to its CR4, checks the guest change to
      CR4 and may fail if the change is invalid.
      
      The failure may cause trouble. Consider we start
        a L1 guest with non-zero L1 PCID in use,
           (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
      and
        a L2 guest with L2 PCID disabled,
           (i.e. L2 CR4.PCIDE == 0)
      and following events may happen:
      
      1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
         into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
         of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
         vcpu->arch.cr4) is left to the value of L2 CR4.
      
      2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
         kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
         because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
         CR3.PCID != 0, L0 KVM will inject GP to L1 guest.
      
      Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8eb3f87d
    • N
      ARM: 8704/1: semihosting: use proper instruction on v7m processors · ee3eaee6
      Nicolas Pitre 提交于
      The svc instruction doesn't exist on v7m processors. Semihosting ops are
      invoked with the bkpt instruction instead.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      ee3eaee6