1. 30 10月, 2017 1 次提交
  2. 20 10月, 2017 1 次提交
    • C
      x86/mm: Limit mmap() of /dev/mem to valid physical addresses · ce56a86e
      Craig Bergstrom 提交于
      Currently, it is possible to mmap() any offset from /dev/mem.  If a
      program mmaps() /dev/mem offsets outside of the addressable limits
      of a system, the page table can be corrupted by setting reserved bits.
      
      For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
      x86_64 system with a 48-bit bus, the page fault handler will be called
      with error_code set to RSVD.  The kernel then crashes with a page table
      corruption error.
      
      This change prevents this page table corruption on x86 by refusing
      to mmap offsets higher than the highest valid address in the system.
      Signed-off-by: NCraig Bergstrom <craigb@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dsafonov@virtuozzo.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce56a86e
  3. 19 10月, 2017 5 次提交
  4. 18 10月, 2017 6 次提交
  5. 17 10月, 2017 1 次提交
  6. 16 10月, 2017 3 次提交
  7. 14 10月, 2017 3 次提交
    • A
      ARM: dts: imx7d: Invert legacy PCI irq mapping · 1c86c9dd
      Andrey Smirnov 提交于
      According to i.MX7D reference manual (Rev. 0.1, table 7-1, page 1221)
      legacy PCI interrupt mapping is as follows:
      
       - PCIE INT A is IRQ 122
       - PCIE INT B is IRQ 123
       - PCIE INT C is IRQ 124
       - PCIE INT D is IRQ 125
      
      Invert the mapping information in corresponding DT node to reflect
      that.
      
      Cc: yurovsky@gmail.com
      Cc: Fabio Estevam <fabio.estevam@nxp.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com>
      Fixes: a816d575 ("ARM: dts: imx7d: Add node for PCIe controller")
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      1c86c9dd
    • B
      x86/microcode: Do the family check first · 1f161f67
      Borislav Petkov 提交于
      On CPUs like AMD's Geode, for example, we shouldn't even try to load
      microcode because they do not support the modern microcode loading
      interface.
      
      However, we do the family check *after* the other checks whether the
      loader has been disabled on the command line or whether we're running in
      a guest.
      
      So move the family checks first in order to exit early if we're being
      loaded on an unsupported family.
      Reported-and-tested-by: NSven Glodowski <glodi1@arcor.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.11..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
      Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f161f67
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  8. 13 10月, 2017 1 次提交
    • A
      powerpc/perf: Fix IMC initialization crash · 0d8ba162
      Anju T Sudhakar 提交于
      Panic observed with latest firmware, and upstream kernel:
      
       NIP init_imc_pmu+0x8c/0xcf0
       LR  init_imc_pmu+0x2f8/0xcf0
       Call Trace:
         init_imc_pmu+0x2c8/0xcf0 (unreliable)
         opal_imc_counters_probe+0x300/0x400
         platform_drv_probe+0x64/0x110
         driver_probe_device+0x3d8/0x580
         __driver_attach+0x14c/0x1a0
         bus_for_each_dev+0x8c/0xf0
         driver_attach+0x34/0x50
         bus_add_driver+0x298/0x350
         driver_register+0x9c/0x180
         __platform_driver_register+0x5c/0x70
         opal_imc_driver_init+0x2c/0x40
         do_one_initcall+0x64/0x1d0
         kernel_init_freeable+0x280/0x374
         kernel_init+0x24/0x160
         ret_from_kernel_thread+0x5c/0x74
      
      While registering nest imc at init, cpu-hotplug callback
      nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
      the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
      memory and cpuhotplug setup.
      
      But when cleaning up the attribute group, we are dereferencing the
      attribute element array without checking whether the backing element
      is not NULL. This causes the kernel panic.
      
      Add a check for the backing element prior to dereferencing the
      attribute element, to handle the failing case gracefully.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      [mpe: Trim change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d8ba162
  9. 12 10月, 2017 8 次提交
    • L
      x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping · 616dd587
      Len Brown 提交于
      SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode
      version number than stepping-4.  Linux needs to know this stepping-3
      specific version number to also enable the TSC_DEADLINE on stepping-3.
      
      The steppings and ucode versions are documented in the SKX BIOS update:
      https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txtSigned-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com
      616dd587
    • P
      x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors · cc6afe22
      Paolo Bonzini 提交于
      Commit 594a30fb ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled
      due to Errata" on CPUs without the feature", 2017-08-30) was also about
      silencing the warning on VirtualBox; however, KVM does expose the TSC
      deadline timer, and it's virtualized so that it is immune from CPU errata.
      
      Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs:
      
           [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                          please update microcode to version: 0xb2 (or later)
      
      Even if you had a hypervisor that does _not_ virtualize the TSC deadline
      and rather exposes the hardware one, it should be the hypervisors task
      to update microcode and possibly hide the flag from CPUID.  So just
      hide the message when running on _any_ hypervisor, not just those that
      do not support the TSC deadline timer.
      
      The older check still makes sense, so keep it.
      
      Fixes: bd9240a1 ("x86/apic: Add TSC_DEADLINE quirk due to errata")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com
      cc6afe22
    • A
      powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() · cd4f2b30
      Anju T Sudhakar 提交于
      Stack trace output during a stress test:
       [    4.310049] Freeing initrd memory: 22592K
      [    4.310646] rtas_flash: no firmware flash support
      [    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
      [    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
      [    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
      [    4.313588] Call Trace:
      [    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
      [    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
      [    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
      [    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
      [    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
      [    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
      [    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
      [    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
      [    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
      [    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
      [    4.314313] Mem-Info:
      [    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0
      
      core_imc_mem_init() at system boot use alloc_pages_node() to get memory
      and alloc_pages_node() throws this stack dump when tried to allocate
      memory from a node which has no memory behind it. Add a ___GFP_NOWARN
      flag in allocation request as a fix.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NVenkat R.B <venkatb3@in.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd4f2b30
    • A
      powerpc/perf: Fix for core/nest imc call trace on cpuhotplug · 0d923820
      Anju T Sudhakar 提交于
      Nest/core pmu units are enabled only when it is used. A reference count is
      maintained for the events which uses the nest/core pmu units. Currently in
      *_imc_counters_release function a WARN() is used for notification of any
      underflow of ref count.
      
      The case where event ref count hit a negative value is, when perf session is
      started, followed by offlining of all cpus in a given core.
      i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
      ref->count to zero, if the current cpu which is about to offline is the last
      cpu in a given core and make an OPAL call to disable the engine in that core.
      And on perf session termination, perf->destroy (core_imc_counters_release) will
      first decrement the ref->count for this core and based on the ref->count value
      an opal call is made to disable the core-imc engine.
      Now, since cpuhotplug path already clears the ref->count for core and disabled
      the engine, perf->destroy() decrementing again at event termination make it
      negative which in turn fires the WARN_ON. The same happens for nest units.
      
      Add a check to see if the reference count is alreday zero, before decrementing
      the count, so that the ref count will not hit a negative value.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d923820
    • H
      KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit · 8eb3f87d
      Haozhong Zhang 提交于
      When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
      guest CR4. Before this CR4 loading, the guest CR4 refers to L2
      CR4. Because these two CR4's are in different levels of guest, we
      should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
      is used to handle guest writes to its CR4, checks the guest change to
      CR4 and may fail if the change is invalid.
      
      The failure may cause trouble. Consider we start
        a L1 guest with non-zero L1 PCID in use,
           (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
      and
        a L2 guest with L2 PCID disabled,
           (i.e. L2 CR4.PCIDE == 0)
      and following events may happen:
      
      1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
         into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
         of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
         vcpu->arch.cr4) is left to the value of L2 CR4.
      
      2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
         kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
         because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
         CR3.PCID != 0, L0 KVM will inject GP to L1 guest.
      
      Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8eb3f87d
    • N
      ARM: 8704/1: semihosting: use proper instruction on v7m processors · ee3eaee6
      Nicolas Pitre 提交于
      The svc instruction doesn't exist on v7m processors. Semihosting ops are
      invoked with the bkpt instruction instead.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      ee3eaee6
    • L
      ARM: 8701/1: fix sparse flags for build on 64bit machines · 6042b8c7
      Luc Van Oostenryck II 提交于
      By default sparse uses the characteristics of the build
      machine to infer things like the wordsize.
      This is fine when doing native builds but for ARM it's,
      I suspect, very rarely the case and if the build are done
      on a 64bit machine we get a bunch of warnings like:
        'cast truncates bits from constant value (... becomes ...)'
      
      Fix this by adding the -m32 flags for sparse.
      Reported-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      6042b8c7
    • N
      ARM: 8700/1: nommu: always reserve address 0 away · 195320fd
      Nicolas Pitre 提交于
      Some nommu systems have RAM at address 0. When vectors are not located
      there, the very beginning of memory remains available for dynamic
      allocations. The memblock allocator explicitly skips the first page
      but the standard page allocator does not, and while it correctly returns
      a non-null struct page pointer for that page, page_address() gives 0
      which gets confused with NULL (out of memory) by callers despite having
      plenty of free memory left.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      195320fd
  10. 11 10月, 2017 1 次提交
  11. 10 10月, 2017 10 次提交