1. 16 10月, 2017 2 次提交
  2. 14 10月, 2017 2 次提交
    • B
      x86/microcode: Do the family check first · 1f161f67
      Borislav Petkov 提交于
      On CPUs like AMD's Geode, for example, we shouldn't even try to load
      microcode because they do not support the modern microcode loading
      interface.
      
      However, we do the family check *after* the other checks whether the
      loader has been disabled on the command line or whether we're running in
      a guest.
      
      So move the family checks first in order to exit early if we're being
      loaded on an unsupported family.
      Reported-and-tested-by: NSven Glodowski <glodi1@arcor.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.11..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
      Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f161f67
    • A
      x86/mm: Flush more aggressively in lazy TLB mode · b956575b
      Andy Lutomirski 提交于
      Since commit:
      
        94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      
      x86's lazy TLB mode has been all the way lazy: when running a kernel thread
      (including the idle thread), the kernel keeps using the last user mm's
      page tables without attempting to maintain user TLB coherence at all.
      
      From a pure semantic perspective, this is fine -- kernel threads won't
      attempt to access user pages, so having stale TLB entries doesn't matter.
      
      Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
      we also allow any paging-structure caches that may exist on the CPU
      to become incoherent.  This means that we can have a
      paging-structure cache entry that references a freed page table, and
      the CPU is within its rights to do a speculative page walk starting
      at the freed page table.
      
      I can imagine this causing two different problems:
      
       - A speculative page walk starting from a bogus page table could read
         IO addresses.  I haven't seen any reports of this causing problems.
      
       - A speculative page walk that involves a bogus page table can install
         garbage in the TLB.  Such garbage would always be at a user VA, but
         some AMD CPUs have logic that triggers a machine check when it notices
         these bogus entries.  I've seen a couple reports of this.
      
      Boris further explains the failure mode:
      
      > It is actually more of an optimization which assumes that paging-structure
      > entries are in WB DRAM:
      >
      > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
      > performance optimization that assumes PML4, PDP, PDE, and PTE entries
      > are in cacheable WB-DRAM; memory type checks may be bypassed, and
      > addresses outside of WB-DRAM may result in undefined behavior or NB
      > protocol errors. 1=Disables performance optimization and allows PML4,
      > PDP, PDE and PTE entries to be in any memory type. Operating systems
      > that maintain page tables in memory types other than WB- DRAM must set
      > TlbCacheDis to insure proper operation."
      >
      > The MCE generated is an NB protocol error to signal that
      >
      > "Link: A specific coherent-only packet from a CPU was issued to an
      > IO link. This may be caused by software which addresses page table
      > structures in a memory type other than cacheable WB-DRAM without
      > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
      > example, when page table structure addresses are above top of memory. In
      > such cases, the NB will generate an MCE if it sees a mismatch between
      > the memory operation generated by the core and the link type."
      >
      > I'm assuming coherent-only packets don't go out on IO links, thus the
      > error.
      
      To fix this, reinstate TLB coherence in lazy mode.  With this patch
      applied, we do it in one of two ways:
      
       - If we have PCID, we simply switch back to init_mm's page tables
         when we enter a kernel thread -- this seems to be quite cheap
         except for the cost of serializing the CPU.
      
       - If we don't have PCID, then we set a flag and switch to init_mm
         the first time we would otherwise need to flush the TLB.
      
      The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
      to override the default mode for benchmarking.
      
      In theory, we could optimize this better by only flushing the TLB in
      lazy CPUs when a page table is freed.  Doing that would require
      auditing the mm code to make sure that all page table freeing goes
      through tlb_remove_page() as well as reworking some data structures
      to implement the improved flush logic.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
      Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b956575b
  3. 13 10月, 2017 1 次提交
    • A
      powerpc/perf: Fix IMC initialization crash · 0d8ba162
      Anju T Sudhakar 提交于
      Panic observed with latest firmware, and upstream kernel:
      
       NIP init_imc_pmu+0x8c/0xcf0
       LR  init_imc_pmu+0x2f8/0xcf0
       Call Trace:
         init_imc_pmu+0x2c8/0xcf0 (unreliable)
         opal_imc_counters_probe+0x300/0x400
         platform_drv_probe+0x64/0x110
         driver_probe_device+0x3d8/0x580
         __driver_attach+0x14c/0x1a0
         bus_for_each_dev+0x8c/0xf0
         driver_attach+0x34/0x50
         bus_add_driver+0x298/0x350
         driver_register+0x9c/0x180
         __platform_driver_register+0x5c/0x70
         opal_imc_driver_init+0x2c/0x40
         do_one_initcall+0x64/0x1d0
         kernel_init_freeable+0x280/0x374
         kernel_init+0x24/0x160
         ret_from_kernel_thread+0x5c/0x74
      
      While registering nest imc at init, cpu-hotplug callback
      nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
      the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
      memory and cpuhotplug setup.
      
      But when cleaning up the attribute group, we are dereferencing the
      attribute element array without checking whether the backing element
      is not NULL. This causes the kernel panic.
      
      Add a check for the backing element prior to dereferencing the
      attribute element, to handle the failing case gracefully.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      [mpe: Trim change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d8ba162
  4. 12 10月, 2017 5 次提交
    • L
      x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping · 616dd587
      Len Brown 提交于
      SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode
      version number than stepping-4.  Linux needs to know this stepping-3
      specific version number to also enable the TSC_DEADLINE on stepping-3.
      
      The steppings and ucode versions are documented in the SKX BIOS update:
      https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txtSigned-off-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com
      616dd587
    • P
      x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors · cc6afe22
      Paolo Bonzini 提交于
      Commit 594a30fb ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled
      due to Errata" on CPUs without the feature", 2017-08-30) was also about
      silencing the warning on VirtualBox; however, KVM does expose the TSC
      deadline timer, and it's virtualized so that it is immune from CPU errata.
      
      Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs:
      
           [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                          please update microcode to version: 0xb2 (or later)
      
      Even if you had a hypervisor that does _not_ virtualize the TSC deadline
      and rather exposes the hardware one, it should be the hypervisors task
      to update microcode and possibly hide the flag from CPUID.  So just
      hide the message when running on _any_ hypervisor, not just those that
      do not support the TSC deadline timer.
      
      The older check still makes sense, so keep it.
      
      Fixes: bd9240a1 ("x86/apic: Add TSC_DEADLINE quirk due to errata")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com
      cc6afe22
    • A
      powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() · cd4f2b30
      Anju T Sudhakar 提交于
      Stack trace output during a stress test:
       [    4.310049] Freeing initrd memory: 22592K
      [    4.310646] rtas_flash: no firmware flash support
      [    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
      [    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
      [    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
      [    4.313588] Call Trace:
      [    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
      [    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
      [    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
      [    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
      [    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
      [    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
      [    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
      [    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
      [    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
      [    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
      [    4.314313] Mem-Info:
      [    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0
      
      core_imc_mem_init() at system boot use alloc_pages_node() to get memory
      and alloc_pages_node() throws this stack dump when tried to allocate
      memory from a node which has no memory behind it. Add a ___GFP_NOWARN
      flag in allocation request as a fix.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NVenkat R.B <venkatb3@in.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd4f2b30
    • A
      powerpc/perf: Fix for core/nest imc call trace on cpuhotplug · 0d923820
      Anju T Sudhakar 提交于
      Nest/core pmu units are enabled only when it is used. A reference count is
      maintained for the events which uses the nest/core pmu units. Currently in
      *_imc_counters_release function a WARN() is used for notification of any
      underflow of ref count.
      
      The case where event ref count hit a negative value is, when perf session is
      started, followed by offlining of all cpus in a given core.
      i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
      ref->count to zero, if the current cpu which is about to offline is the last
      cpu in a given core and make an OPAL call to disable the engine in that core.
      And on perf session termination, perf->destroy (core_imc_counters_release) will
      first decrement the ref->count for this core and based on the ref->count value
      an opal call is made to disable the core-imc engine.
      Now, since cpuhotplug path already clears the ref->count for core and disabled
      the engine, perf->destroy() decrementing again at event termination make it
      negative which in turn fires the WARN_ON. The same happens for nest units.
      
      Add a check to see if the reference count is alreday zero, before decrementing
      the count, so that the ref count will not hit a negative value.
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: NSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d923820
    • H
      KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit · 8eb3f87d
      Haozhong Zhang 提交于
      When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
      guest CR4. Before this CR4 loading, the guest CR4 refers to L2
      CR4. Because these two CR4's are in different levels of guest, we
      should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
      is used to handle guest writes to its CR4, checks the guest change to
      CR4 and may fail if the change is invalid.
      
      The failure may cause trouble. Consider we start
        a L1 guest with non-zero L1 PCID in use,
           (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
      and
        a L2 guest with L2 PCID disabled,
           (i.e. L2 CR4.PCIDE == 0)
      and following events may happen:
      
      1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
         into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
         of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
         vcpu->arch.cr4) is left to the value of L2 CR4.
      
      2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
         kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
         because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
         CR3.PCID != 0, L0 KVM will inject GP to L1 guest.
      
      Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8eb3f87d
  5. 11 10月, 2017 1 次提交
  6. 10 10月, 2017 14 次提交
    • L
      KVM: MMU: always terminate page walks at level 1 · 829ee279
      Ladi Prosek 提交于
      is_last_gpte() is not equivalent to the pseudo-code given in commit
      6bb69c9b ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
      value of last_nonleaf_level may override the result even if level == 1.
      
      It is critical for is_last_gpte() to return true on level == 1 to
      terminate page walks. Otherwise memory corruption may occur as level
      is used as an index to various data structures throughout the page
      walking code.  Even though the actual bug would be wherever the MMU is
      initialized (as in the previous patch), be defensive and ensure here
      that is_last_gpte() returns the correct value.
      
      This patch is also enough to fix CVE-2017-12188.
      
      Fixes: 6bb69c9b
      Cc: stable@vger.kernel.org
      Cc: Andy Honig <ahonig@google.com>
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      [Panic if walk_addr_generic gets an incorrect level; this is a serious
       bug and it's not worth a WARN_ON where the recovery path might hide
       further exploitable issues; suggested by Andrew Honig. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      829ee279
    • L
      KVM: nVMX: update last_nonleaf_level when initializing nested EPT · fd19d3b4
      Ladi Prosek 提交于
      The function updates context->root_level but didn't call
      update_last_nonleaf_level so the previous and potentially wrong value
      was used for page walks.  For example, a zero value of last_nonleaf_level
      would allow a potential out-of-bounds access in arch/x86/mmu/paging_tmpl.h's
      walk_addr_generic function (CVE-2017-12188).
      
      Fixes: 155a97a3Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fd19d3b4
    • Z
      xen/vcpu: Use a unified name about cpu hotplug state for pv and pvhvm · eac779aa
      Zhenzhong Duan 提交于
      As xen_cpuhp_setup is called by PV and PVHVM, the name of "x86/xen/hvm_guest"
      is confusing.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      eac779aa
    • M
      x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing · ab7ff471
      Marcelo Henrique Cerri 提交于
      Do not consider the fixed size of hv_vp_set when passing the variable
      header size to hv_do_rep_hypercall().
      
      The Hyper-V hypervisor specification states that for a hypercall with a
      variable header only the size of the variable portion should be supplied
      via the input control.
      
      For HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX calls that means the
      fixed portion of hv_vp_set should not be considered.
      
      That fixes random failures of some applications that are unexpectedly
      killed with SIGBUS or SIGSEGV.
      Signed-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Fixes: 628f54cc ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")
      Link: http://lkml.kernel.org/r/1507210469-29065-1-git-send-email-marcelo.cerri@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab7ff471
    • V
      x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures · 60d73a7c
      Vitaly Kuznetsov 提交于
      hv_do_hypercall() does virt_to_phys() translation and with some configs
      (CONFIG_SLAB) this doesn't work for percpu areas, we pass wrong memory to
      hypervisor and get #GP. We could use working slow_virt_to_phys() instead
      but doing so kills the performance.
      
      Move pcpu_flush/pcpu_flush_ex structures out of percpu areas and
      allocate memory on first call. The additional level of indirection gives
      us a small performance penalty, in future we may consider introducing
      hypercall functions which avoid virt_to_phys() conversion and cache
      physical addresses of pcpu_flush/pcpu_flush_ex structures somewhere.
      Reported-by: NSimon Xiao <sixiao@microsoft.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/20171005113924.28021-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60d73a7c
    • V
      x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs · a3b74243
      Vitaly Kuznetsov 提交于
      hv_flush_pcpu_ex structures are not cleared between calls for performance
      reasons (they're variable size up to PAGE_SIZE each) but we must clear
      hv_vp_set.bank_contents part of it to avoid flushing unneeded vCPUs. The
      rest of the structure is formed correctly.
      
      To do the clearing in an efficient way stash the maximum possible vCPU
      number (this may differ from Linux CPU id).
      Reported-by: NJork Loeser <Jork.Loeser@microsoft.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/20171006154854.18092-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a3b74243
    • C
      perf/x86/intel/uncore: Fix memory leaks on allocation failures · 629eb703
      Colin Ian King 提交于
      Currently if an allocation fails then the error return paths
      don't free up any currently allocated pmus[].boxes and pmus causing
      a memory leak.  Add an error clean up exit path that frees these
      objects.
      
      Detected by CoverityScan, CID#711632 ("Resource Leak")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Fixes: 087bfbb0 ("perf/x86: Add generic Intel uncore PMU support")
      Link: http://lkml.kernel.org/r/20171009172655.6132-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      629eb703
    • J
      x86/unwind: Disable unwinder warnings on 32-bit · d4a2d031
      Josh Poimboeuf 提交于
      x86-32 doesn't have stack validation, so in most cases it doesn't make
      sense to warn about bad frame pointers.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a69658760800bf281e6353248c23e0fa0acf5230.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4a2d031
    • J
      x86/unwind: Align stack pointer in unwinder dump · 99bd28a4
      Josh Poimboeuf 提交于
      When printing the unwinder dump, the stack pointer could be unaligned,
      for one of two reasons:
      
      - stack corruption; or
      
      - GCC created an unaligned stack.
      
      There's no way for the unwinder to tell the difference between the two,
      so we have to assume one or the other.  GCC unaligned stacks are very
      rare, and have only been spotted before GCC 5.  Presumably, if we're
      doing an unwinder stack dump, stack corruption is more likely than a
      GCC unaligned stack.  So always align the stack before starting the
      dump.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      99bd28a4
    • J
      x86/unwind: Use MSB for frame pointer encoding on 32-bit · 5c99b692
      Josh Poimboeuf 提交于
      On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings
      like:
      
        WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000
      
      And also there were some stack dumps with a bunch of unreliable '?'
      symbols after an apic_timer_interrupt symbol, meaning the unwinder got
      confused when it tried to read the regs.
      
      The cause of those issues is that, with GCC 4.8 (and possibly older),
      there are cases where GCC misaligns the stack pointer in a leaf function
      for no apparent reason:
      
        c124a388 <acpi_rs_move_data>:
        c124a388:       55                      push   %ebp
        c124a389:       89 e5                   mov    %esp,%ebp
        c124a38b:       57                      push   %edi
        c124a38c:       56                      push   %esi
        c124a38d:       89 d6                   mov    %edx,%esi
        c124a38f:       53                      push   %ebx
        c124a390:       31 db                   xor    %ebx,%ebx
        c124a392:       83 ec 03                sub    $0x3,%esp
        ...
        c124a3e3:       83 c4 03                add    $0x3,%esp
        c124a3e6:       5b                      pop    %ebx
        c124a3e7:       5e                      pop    %esi
        c124a3e8:       5f                      pop    %edi
        c124a3e9:       5d                      pop    %ebp
        c124a3ea:       c3                      ret
      
      If an interrupt occurs in such a function, the regs on the stack will be
      unaligned, which breaks the frame pointer encoding assumption.  So on
      32-bit, use the MSB instead of the LSB to encode the regs.
      
      This isn't an issue on 64-bit, because interrupts align the stack before
      writing to it.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c99b692
    • J
      x86/unwind: Fix dereference of untrusted pointer · 62dd86ac
      Josh Poimboeuf 提交于
      Tetsuo Handa and Fengguang Wu reported a panic in the unwinder:
      
        BUG: unable to handle kernel NULL pointer dereference at 000001f2
        IP: update_stack_state+0xd4/0x340
        *pde = 00000000
      
        Oops: 0000 [#1] PREEMPT SMP
        CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be676 #592
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
        task: bb0b53c0 task.stack: bb3ac000
        EIP: update_stack_state+0xd4/0x340
        EFLAGS: 00010002 CPU: 0
        EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570
        ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f
         DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
        CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690
        DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
        DR6: fffe0ff0 DR7: 00000400
        Call Trace:
         ? unwind_next_frame+0xea/0x400
         ? __unwind_start+0xf5/0x180
         ? __save_stack_trace+0x81/0x160
         ? save_stack_trace+0x20/0x30
         ? __lock_acquire+0xfa5/0x12f0
         ? lock_acquire+0x1c2/0x230
         ? tick_periodic+0x3a/0xf0
         ? _raw_spin_lock+0x42/0x50
         ? tick_periodic+0x3a/0xf0
         ? tick_periodic+0x3a/0xf0
         ? debug_smp_processor_id+0x12/0x20
         ? tick_handle_periodic+0x23/0xc0
         ? local_apic_timer_interrupt+0x63/0x70
         ? smp_trace_apic_timer_interrupt+0x235/0x6a0
         ? trace_apic_timer_interrupt+0x37/0x3c
         ? strrchr+0x23/0x50
        Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83
        EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f
        CR2: 00000000000001f2
        ---[ end trace 0d147fd4aba8ff50 ]---
        Kernel panic - not syncing: Fatal exception in interrupt
      
      On x86-32, after decoding a frame pointer to get a regs address,
      regs_size() dereferences the regs pointer when it checks regs->cs to see
      if the regs are user mode.  This is dangerous because it's possible that
      what looks like a decoded frame pointer is actually a corrupt value, and
      we don't want the unwinder to make things worse.
      
      Instead of calling regs_size() on an unsafe pointer, just assume they're
      kernel regs to start with.  Later, once it's safe to access the regs, we
      can do the user mode check and corresponding safety check for the
      remaining two regs.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 5ed8d8bb ("x86/unwind: Move common code into update_stack_state()")
      Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      62dd86ac
    • T
      powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology() · 6b2c08f9
      Thiago Jung Bauermann 提交于
      It turns out that not all paths calling arch_update_cpu_topology() hold
      cpu_hotplug_lock, but that's OK because those paths can't race with
      any concurrent hotplug events.
      
      Warnings were reported with the following trace:
      
        lockdep_assert_cpus_held
        arch_update_cpu_topology
        sched_init_domains
        sched_init_smp
        kernel_init_freeable
        kernel_init
        ret_from_kernel_thread
      
      Which is safe because it's called early in boot when hotplug is not
      live yet.
      
      And also this trace:
      
        lockdep_assert_cpus_held
        arch_update_cpu_topology
        partition_sched_domains
        cpuset_update_active_cpus
        sched_cpu_deactivate
        cpuhp_invoke_callback
        cpuhp_down_callbacks
        cpuhp_thread_fun
        smpboot_thread_fn
        kthread
        ret_from_kernel_thread
      
      Which is safe because it's called as part of CPU hotplug, so although
      we don't hold the CPU hotplug lock, there is another thread driving
      the CPU hotplug operation which does hold the lock, and there is no
      race.
      
      Thanks to tglx for deciphering it for us.
      
      Fixes: 3e401f7a ("powerpc: Only obtain cpu_hotplug_lock if called by rtasd")
      Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b2c08f9
    • S
      powerpc/lib/sstep: Fix count leading zeros instructions · b0490a04
      Sandipan Das 提交于
      According to the GCC documentation, the behaviour of __builtin_clz()
      and __builtin_clzl() is undefined if the value of the input argument
      is zero. Without handling this special case, these builtins have been
      used for emulating the following instructions:
        * Count Leading Zeros Word (cntlzw[.])
        * Count Leading Zeros Doubleword (cntlzd[.])
      
      This fixes the emulated behaviour of these instructions by adding an
      additional check for this special case.
      
      Fixes: 3cdfcbfd ("powerpc: Change analyse_instr so it doesn't modify *regs")
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b0490a04
    • K
      powerpc/livepatch: Fix livepatch stack access · e36a82ee
      Kamalesh Babulal 提交于
      While running stress test with livepatch module loaded, kernel bug was
      triggered.
      
        cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60]
        5:mon> t
        [c0000000eb9d3de0] c0000000eb9d3e30 (unreliable)
        [c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120
         --- Exception: 501 (Hardware Interrupt) at c000000000053040 livepatch_handler+0x4c/0x74
        [c0000000eb9d4120] 0000000057ac6e9d (unreliable)
        [d0000000089d9f78] 2e0965747962382e
        SP (965747962342e09) is in userspace
      
      When an interrupt occurs during the livepatch_handler execution, it's
      possible for the livepatch_stack and/or thread_info to be corrupted.
      eg:
      
        Task A                        Interrupt Handler
        =========                     =================
        livepatch_handler:
        mr r0, r1
        ld r1, TI_livepatch_sp(r12)
                                      hardware_interrupt_common:
                                        do_IRQ+0x8:
                                          mflr    r0          <- saved stack pointer is overwritten
                                          bl      _mcount
                                          ...
                                          std     r27,-40(r1) <- overwrite of thread_info()
      
        lis r2, STACK_END_MAGIC@h
        ori r2, r2, STACK_END_MAGIC@l
        ld  r12, -8(r1)
      
      Fix the corruption by using r11 register for livepatch stack
      manipulation, instead of shuffling task stack and livepatch stack into
      r1 register. Using r11 register also avoids disabling/enabling irq's
      while setting up the livepatch stack.
      Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e36a82ee
  7. 09 10月, 2017 8 次提交
    • D
      Revert commit 1a8b6d76 ("net:add one common config...") · f4986d25
      Ding Tianhong 提交于
      The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
      to indicate that Relaxed Ordering Attributes (RO) should not
      be used for Transaction Layer Packets (TLP) targeted toward
      these affected Root Port, it will clear the bit4 in the PCIe
      Device Control register, so the PCIe device drivers could
      query PCIe configuration space to determine if it can send
      TLPs to Root Port with the Relaxed Ordering Attributes set.
      
      With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
      to control the Relaxed Ordering Attributes for the ixgbe drivers
      just like the commit 1a8b6d76 ("net:add one common config...") did,
      so revert this commit.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f4986d25
    • P
      MIPS: math-emu: Remove pr_err() calls from fpu_emu() · ca8eb05b
      Paul Burton 提交于
      The FPU emulator includes 2 calls to pr_err() which are triggered by
      invalid instruction encodings for MIPSr6 cmp.cond.fmt instructions.
      These cases are not kernel errors, merely invalid instructions which are
      already handled by delivering a SIGILL which will provide notification
      that something failed in cases where that makes sense.
      
      In cases where that SIGILL is somewhat expected & being handled, for
      example when crashme happens to generate one of the affected bad
      encodings, the message is printed with no useful context about what
      triggered it & spams the kernel log for no good reason.
      
      Remove the pr_err() calls to make crashme run silently & treat the bad
      encodings the same way we do others, with a SIGILL & no further kernel
      log output.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: f8c3c671 ("MIPS: math-emu: Add support for the CMP.condn.fmt R6 instruction")
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org> # v4.3+
      Patchwork: https://patchwork.linux-mips.org/patch/17253/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      ca8eb05b
    • P
      MIPS: Fix generic-board-config.sh for builds using O= · e1270575
      Paul Burton 提交于
      When configuring the kernel using one of the generic MIPS defconfig
      targets, the generic-board-config.sh script is used to check
      requirements listed in board config fragments against a reference config
      in order to determine which board config fragments to merge into the
      final config.
      
      When specifying O= to configure in a directory other than the kernel
      source directory, this generic-board-config.sh script is invoked in the
      directory that we are configuring in (ie. the directory that O equals),
      and the path to the reference config is relative to the current
      directory. The script then changes the current directory to the source
      tree, which unfortunately breaks later access to the reference file
      since its path is relative to a directory that is no longer the current
      working directory. This results in configuration failing with errors
      such as:
      
        $ make ARCH=mips O=tmp 32r2_defconfig
        make[1]: Entering directory '/home/pburton/src/linux/tmp'
        Using ../arch/mips/configs/generic_defconfig as base
        Merging ../arch/mips/configs/generic/32r2.config
        Merging ../arch/mips/configs/generic/eb.config
        grep: ./.config.32r2_defconfig: No such file or directory
        grep: ./.config.32r2_defconfig: No such file or directory
        The base file '.config' does not exist.  Exit.
        make[1]: *** [arch/mips/Makefile:505: 32r2_defconfig] Error 1
        make[1]: Leaving directory '/home/pburton/src/linux-ingenic/tmp'
        make: *** [Makefile:145: sub-make] Error 2
      
      Fix this by avoiding changing the working directory in
      generic-board-config.sh, instead using full paths to files under
      $(srctree)/ where necessary.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 27e0d4b0 ("MIPS: generic: Allow filtering enabled boards by requirements")
      Cc: linux-mips@linux-mips.org
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Cc: kbuild-all@01.org
      Patchwork: https://patchwork.linux-mips.org/patch/17231/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      e1270575
    • P
      MIPS: Fix cmpxchg on 32b signed ints for 64b kernel with !kernel_uses_llsc · 133d68e0
      Paul Burton 提交于
      Commit 8263db4d ("MIPS: cmpxchg: Implement __cmpxchg() as a
      function") refactored our implementation of __cmpxchg() to be a function
      rather than a macro, with the aim of making it easier to read & modify.
      Unfortunately the commit breaks use of cmpxchg() for signed 32 bit
      values when we have a 64 bit kernel with kernel_uses_llsc == false,
      because:
      
       - In cmpxchg_local() we cast the old value to the type the pointer
         points to, and then to an unsigned long. If the pointer points to a
         signed type smaller than 64 bits then the old value will be sign
         extended to 64 bits. That is, bits beyond the size of the pointed to
         type will be set to 1 if the old value is negative. In the case of a
         signed 32 bit integer with a negative value, bits 63:32 will all be
         set.
      
       - In __cmpxchg_asm() we load the value from memory, ie. dereference the
         pointer, and store the value as an unsigned integer (__ret) whose
         size matches the pointer. For a 32 bit cmpxchg() this means we store
         the value in a u32, because the pointer provided to __cmpxchg_asm()
         by __cmpxchg() is of type volatile u32 *.
      
       - __cmpxchg_asm() then checks whether the value in memory (__ret)
         matches the provided old value, by comparing the two values. This
         results in the u32 being promoted to a 64 bit unsigned long to match
         the old argument - however because both types are unsigned the value
         is zero extended, which does not match the sign extension performed
         on the old value in cmpxchg_local() earlier.
      
      This mismatch means that unfortunate cmpxchg() calls can incorrectly
      fail for 64 bit kernels with kernel_uses_llsc == false. This is the case
      on at least non-SMP Cavium Octeon kernels, which hardcode
      kernel_uses_llsc in their cpu-feature-overrides.h header. Using a
      v4.13-rc7 kernel configured using cavium_octeon_defconfig with SMP
      manually disabled, this presents itself as oddity when we reach
      userland - for example:
      
        can't run '/bin/mount': Text file busy
        can't run '/bin/mkdir': Text file busy
        can't run '/bin/mkdir': Text file busy
        can't run '/bin/mount': Text file busy
        can't run '/bin/hostname': Text file busy
        can't run '/etc/init.d/rcS': Text file busy
        can't run '/sbin/getty': Text file busy
        can't run '/sbin/getty': Text file busy
      
      It appears that some part of the init process, which is in this case
      buildroot's busybox init, is running successfully. It never manages to
      reach the login prompt though, and complains about /sbin/getty being
      busy repeatedly and indefinitely.
      
      Fix this by casting the old value provided to __cmpxchg_asm() to an
      appropriately sized unsigned integer, such that we consistently
      zero-extend avoiding the mismatch. The __cmpxchg_small() case for 8 & 16
      bit values is unaffected because __cmpxchg_small() already masks
      provided values appropriately.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 8263db4d ("MIPS: cmpxchg: Implement __cmpxchg() as a function")
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/17226/
      Cc: linux-mips@linux-mips.org
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      133d68e0
    • K
      MIPS: loongson1: set default number of rx and tx queues for stmmac · 1b6ad6df
      Kelvin Cheung 提交于
      Set the default number of RX and TX queues due to
      the recent changes of stmmac driver.
      Otherwise the ethernet will crash once it starts.
      Signed-off-by: NKelvin Cheung <keguang.zhang@gmail.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/17452/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      1b6ad6df
    • M
      MIPS: bpf: Fix uninitialised target compiler error · 94c3390a
      Matt Redfearn 提交于
      Compiling ebpf_jit.c with gcc 4.9 results in a (likely spurious)
      compiler warning, as gcc has detected that the variable "target" may be
      used uninitialised. Since -Werror is active, this is treated as an error
      and causes a kernel build failure whenever CONFIG_MIPS_EBPF_JIT is
      enabled.
      
      arch/mips/net/ebpf_jit.c: In function 'build_one_insn':
      arch/mips/net/ebpf_jit.c:1118:80: error: 'target' may be used
      uninitialized in this function [-Werror=maybe-uninitialized]
          emit_instr(ctx, j, target);
                                                                                      ^
      cc1: all warnings being treated as errors
      
      Fix this by initialising "target" to 0. If it really is used
      uninitialised this would result in a jump to 0 and a detectable run time
      failure.
      Signed-off-by: NMatt Redfearn <matt.redfearn@imgtec.com>
      Fixes: b6bd53f9 ("MIPS: Add missing file for eBPF JIT.")
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.13+
      Patchwork: https://patchwork.linux-mips.org/patch/17375/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      94c3390a
    • M
      x86/alternatives: Fix alt_max_short macro to really be a max() · 6b32c126
      Mathias Krause 提交于
      The alt_max_short() macro in asm/alternative.h does not work as
      intended, leading to nasty bugs. E.g. alt_max_short("1", "3")
      evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not
      exactly the maximum of 1 and 3.
      
      In fact, I had to learn it the hard way by crashing my kernel in not
      so funny ways by attempting to make use of the ALTENATIVE_2 macro
      with alternatives where the first one was larger than the second
      one.
      
      According to [1] and commit dbe4058a ("x86/alternatives: Fix
      ALTERNATIVE_2 padding generation properly") the right handed side
      should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the
      macro work as intended.
      
      While at it, fix up the comments regarding the additional "-", too.
      It's not about gas' usage of s32 but brain dead logic of having a
      "true" value of -1 for the < operator ... *sigh*
      
      Btw., the one in asm/alternative-asm.h is correct. And, apparently,
      all current users of ALTERNATIVE_2() pass same sized alternatives,
      avoiding to hit the bug.
      
      [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMaxReviewed-and-tested-by: NBorislav Petkov <bp@suse.de>
      Fixes: dbe4058a ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly")
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1507228213-13095-1-git-send-email-minipli@googlemail.com
      6b32c126
    • A
      x86/mm/64: Fix reboot interaction with CR4.PCIDE · 924c6b90
      Andy Lutomirski 提交于
      Trying to reboot via real mode fails with PCID on: long mode cannot
      be exited while CR4.PCIDE is set.  (No, I have no idea why, but the
      SDM and actual CPUs are in agreement here.)  The result is a GPF and
      a hang instead of a reboot.
      
      I didn't catch this in testing because neither my computer nor my VM
      reboots this way.  I can trigger it with reboot=bios, though.
      
      Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
      Reported-and-tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org
      924c6b90
  8. 06 10月, 2017 7 次提交
    • E
      ARC: [plat-hsdk]: Add reset controller node to manage ethernet reset · ab8eb7db
      Eugeniy Paltsev 提交于
      DW ethernet controller on HSDK hangs sometimes after SW reset, so
      add reset node to make possible to reset DW ethernet controller HW.
      Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      ab8eb7db
    • S
      arm64: Ensure fpsimd support is ready before userspace is active · ae2e972d
      Suzuki K Poulose 提交于
      We register the pm/hotplug callbacks for FPSIMD as late_initcall,
      which happens after the userspace is active (from initramfs via
      populate_rootfs, a rootfs_initcall). Make sure we are ready even
      before the userspace could potentially use it, by promoting to
      a core_initcall.
      
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Dave Martin <dave.martin@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      ae2e972d
    • S
      arm64: Ensure the instruction emulation is ready for userspace · c0d8832e
      Suzuki K Poulose 提交于
      We trap and emulate some instructions (e.g, mrs, deprecated instructions)
      for the userspace. However the handlers for these are registered as
      late_initcalls and the userspace could be up and running from the initramfs
      by that time (with populate_rootfs, which is a rootfs_initcall()). This
      could cause problems for the early applications ending up in failure
      like :
      
      [   11.152061] modprobe[93]: undefined instruction: pc=0000ffff8ca48ff4
      
      This patch promotes the specific calls to core_initcalls, which are
      guaranteed to be completed before we hit userspace.
      
      Cc: stable@vger.kernel.org
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Matthias Brugger <mbrugger@suse.com>
      Cc: James Morse <james.morse@arm.com>
      Reported-by: NMatwey V. Kornilov <matwey.kornilov@gmail.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c0d8832e
    • G
      powerpc/tm: Fix illegal TM state in signal handler · 044215d1
      Gustavo Romero 提交于
      Currently it's possible that on returning from the signal handler
      through the restore_tm_sigcontexts() code path (e.g. from a signal
      caught due to a `trap` instruction executed in the middle of an HTM
      block, or a deliberately constructed sigframe) an illegal TM state
      (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
      implicitly the MSR register from SRR1 register on return to userspace
      it causes a TM Bad Thing exception.
      
      That illegal state can be set (a) by a malicious user that disables
      the TM bit by tweaking the bits in uc_mcontext before returning from
      the signal handler or (b) by a sufficient number of context switches
      occurring such that the load_tm counter overflows and TM is disabled
      whilst in the signal handler.
      
      This commit fixes the illegal TM state by ensuring that TM bit is
      always enabled before we return from restore_tm_sigcontexts(). A small
      comment correction is made as well.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      044215d1
    • C
      powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks · 265e60a1
      Cyril Bur 提交于
      When using transactional memory (TM), the CPU can be in one of six
      states as far as TM is concerned, encoded in the Machine State
      Register (MSR). Certain state transitions are illegal and if attempted
      trigger a "TM Bad Thing" type program check exception.
      
      If we ever hit one of these exceptions it's treated as a bug, ie. we
      oops, and kill the process and/or panic, depending on configuration.
      
      One case where we can trigger a TM Bad Thing, is when returning to
      userspace after a system call or interrupt, using RFID. When this
      happens the CPU first restores the user register state, in particular
      r1 (the stack pointer) and then attempts to update the MSR. However
      the MSR update is not allowed and so we take the program check with
      the user register state, but the kernel MSR.
      
      This tricks the exception entry code into thinking we have a bad
      kernel stack pointer, because the MSR says we're coming from the
      kernel, but r1 is pointing to userspace.
      
      To avoid this we instead always switch to the emergency stack if we
      take a TM Bad Thing from the kernel. That way none of the user
      register values are used, other than for printing in the oops message.
      
      This is the fix for CVE-2017-1000255.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Rewrite change log & comments, tweak asm slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      265e60a1
    • A
      powerpc/powernv: Increase memory block size to 1GB on radix · 53ecde0b
      Anton Blanchard 提交于
      Memory hot unplug on PowerNV radix hosts is broken. Our memory block
      size is 256MB but since we map the linear region with very large
      pages, each pte we tear down maps 1GB.
      
      A hot unplug of one 256MB memory block results in 768MB of memory
      getting unintentionally unmapped. At this point we are likely to oops.
      
      Fix this by increasing our memory block size to 1GB on PowerNV radix
      hosts.
      
      Fixes: 4b5d62ca ("powerpc/mm: add radix__remove_section_mapping()")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      53ecde0b
    • A
      KVM: add X86_LOCAL_APIC dependency · e42eef4b
      Arnd Bergmann 提交于
      The rework of the posted interrupt handling broke building without
      support for the local APIC:
      
      ERROR: "boot_cpu_physical_apicid" [arch/x86/kvm/kvm-intel.ko] undefined!
      
      That configuration is probably not particularly useful anyway, so
      we can avoid the randconfig failures by adding a Kconfig dependency.
      
      Fixes: 8b306e2f ("KVM: VMX: avoid double list add with VT-d posted interrupts")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e42eef4b