提交 · 4e57b94664fef55aa71cac33b4632fdfdd52b695 · openeuler / Kernel

18 10月, 2017 3 次提交

x86/mm: Tidy up "x86/mm: Flush more aggressively in lazy TLB mode" · 4e57b946

由 Andy Lutomirski 提交于 10月 14, 2017

Due to timezones, commit:

  b956575b ("x86/mm: Flush more aggressively in lazy TLB mode")

was an outdated patch that well tested and fixed the bug but didn't
address Borislav's review comments.

Tidy it up:

 - The name "tlb_use_lazy_mode()" was highly confusing.  Change it to
   "tlb_defer_switch_to_init_mm()", which describes what it actually
   means.

 - Move the static_branch crap into a helper.

 - Improve comments.

Actually removing the debugfs option is in the next patch.
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b956575b ("x86/mm: Flush more aggressively in lazy TLB mode")
Link: http://lkml.kernel.org/r/154ef95428d4592596b6e98b0af1d2747d6cfbf8.1508000261.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4e57b946

x86/mm/64: Remove the last VM_BUG_ON() from the TLB code · e8b9b0cc

由 Andy Lutomirski 提交于 10月 14, 2017

Let's avoid hard-to-diagnose crashes in the future.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f423bbc97864089fbdeb813f1ea126c6eaed844a.1508000261.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e8b9b0cc

x86/microcode/intel: Disable late loading on model 79 · 723f2828

由 Borislav Petkov 提交于 10月 18, 2017

Blacklist Broadwell X model 79 for late loading due to an erratum.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NTony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171018111225.25635-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

723f2828

17 10月, 2017 1 次提交

x86/idt: Initialize early IDT before cr4_init_shadow() · 9c48c096

由 Thomas Gleixner 提交于 10月 16, 2017

Moving the early IDT setup out of assembly code breaks the boot on first
generation 486 systems.

The reason is that the call of idt_setup_early_handler, which sets up the
early handlers was added after the call to cr4_init_shadow().

cr4_init_shadow() tries to read CR4 which is not available on those
systems. The accessor function uses a extable fixup to handle the resulting
fault. As the IDT is not set up yet, the cr4 read exception causes an
instantaneous reboot for obvious reasons.

Call idt_setup_early_handler() before cr4_init_shadow() so IDT is set up
before the first exception hits.

Fixes: 87e81786 ("x86/idt: Move early IDT setup out of 32-bit asm")
Reported-and-tested-by: NMatthew Whitehead <whiteheadm@acm.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710161210290.1973@nanos

9c48c096

16 10月, 2017 1 次提交

x86/cpu/intel_cacheinfo: Remove redundant assignment to 'this_leaf' · e6fc454b

由 Colin Ian King 提交于 10月 15, 2017

The 'this_leaf' variable is assigned a value that is never
read and it is updated a little later with a newer value,
hence we can remove the redundant assignment.

Cleans up the following Clang warning:

  Value stored to 'this_leaf' is never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20171015160203.12332-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e6fc454b

14 10月, 2017 2 次提交

x86/microcode: Do the family check first · 1f161f67

由 Borislav Petkov 提交于 10月 12, 2017

On CPUs like AMD's Geode, for example, we shouldn't even try to load
microcode because they do not support the modern microcode loading
interface.

However, we do the family check *after* the other checks whether the
loader has been disabled on the command line or whether we're running in
a guest.

So move the family checks first in order to exit early if we're being
loaded on an unsupported family.
Reported-and-tested-by: NSven Glodowski <glodi1@arcor.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.11..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

1f161f67

x86/mm: Flush more aggressively in lazy TLB mode · b956575b

由 Andy Lutomirski 提交于 10月 09, 2017

Since commit:

  94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")

x86's lazy TLB mode has been all the way lazy: when running a kernel thread
(including the idle thread), the kernel keeps using the last user mm's
page tables without attempting to maintain user TLB coherence at all.

From a pure semantic perspective, this is fine -- kernel threads won't
attempt to access user pages, so having stale TLB entries doesn't matter.

Unfortunately, I forgot about a subtlety.  By skipping TLB flushes,
we also allow any paging-structure caches that may exist on the CPU
to become incoherent.  This means that we can have a
paging-structure cache entry that references a freed page table, and
the CPU is within its rights to do a speculative page walk starting
at the freed page table.

I can imagine this causing two different problems:

 - A speculative page walk starting from a bogus page table could read
   IO addresses.  I haven't seen any reports of this causing problems.

 - A speculative page walk that involves a bogus page table can install
   garbage in the TLB.  Such garbage would always be at a user VA, but
   some AMD CPUs have logic that triggers a machine check when it notices
   these bogus entries.  I've seen a couple reports of this.

Boris further explains the failure mode:

> It is actually more of an optimization which assumes that paging-structure
> entries are in WB DRAM:
>
> "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
> performance optimization that assumes PML4, PDP, PDE, and PTE entries
> are in cacheable WB-DRAM; memory type checks may be bypassed, and
> addresses outside of WB-DRAM may result in undefined behavior or NB
> protocol errors. 1=Disables performance optimization and allows PML4,
> PDP, PDE and PTE entries to be in any memory type. Operating systems
> that maintain page tables in memory types other than WB- DRAM must set
> TlbCacheDis to insure proper operation."
>
> The MCE generated is an NB protocol error to signal that
>
> "Link: A specific coherent-only packet from a CPU was issued to an
> IO link. This may be caused by software which addresses page table
> structures in a memory type other than cacheable WB-DRAM without
> properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
> example, when page table structure addresses are above top of memory. In
> such cases, the NB will generate an MCE if it sees a mismatch between
> the memory operation generated by the core and the link type."
>
> I'm assuming coherent-only packets don't go out on IO links, thus the
> error.

To fix this, reinstate TLB coherence in lazy mode.  With this patch
applied, we do it in one of two ways:

 - If we have PCID, we simply switch back to init_mm's page tables
   when we enter a kernel thread -- this seems to be quite cheap
   except for the cost of serializing the CPU.

 - If we don't have PCID, then we set a flag and switch to init_mm
   the first time we would otherwise need to flush the TLB.

The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
to override the default mode for benchmarking.

In theory, we could optimize this better by only flushing the TLB in
lazy CPUs when a page table is freed.  Doing that would require
auditing the mm code to make sure that all page table freeing goes
through tlb_remove_page() as well as reworking some data structures
to implement the improved flush logic.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: NAdam Borowski <kilobyte@angband.pl>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 94b1b03b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

b956575b

13 10月, 2017 1 次提交

powerpc/perf: Fix IMC initialization crash · 0d8ba162

由 Anju T Sudhakar 提交于 10月 13, 2017

Panic observed with latest firmware, and upstream kernel:

 NIP init_imc_pmu+0x8c/0xcf0
 LR  init_imc_pmu+0x2f8/0xcf0
 Call Trace:
   init_imc_pmu+0x2c8/0xcf0 (unreliable)
   opal_imc_counters_probe+0x300/0x400
   platform_drv_probe+0x64/0x110
   driver_probe_device+0x3d8/0x580
   __driver_attach+0x14c/0x1a0
   bus_for_each_dev+0x8c/0xf0
   driver_attach+0x34/0x50
   bus_add_driver+0x298/0x350
   driver_register+0x9c/0x180
   __platform_driver_register+0x5c/0x70
   opal_imc_driver_init+0x2c/0x40
   do_one_initcall+0x64/0x1d0
   kernel_init_freeable+0x280/0x374
   kernel_init+0x24/0x160
   ret_from_kernel_thread+0x5c/0x74

While registering nest imc at init, cpu-hotplug callback
nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
memory and cpuhotplug setup.

But when cleaning up the attribute group, we are dereferencing the
attribute element array without checking whether the backing element
is not NULL. This causes the kernel panic.

Add a check for the backing element prior to dereferencing the
attribute element, to handle the failing case gracefully.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[mpe: Trim change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0d8ba162

12 10月, 2017 5 次提交

x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping · 616dd587

由 Len Brown 提交于 10月 11, 2017

SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode
version number than stepping-4. Linux needs to know this stepping-3
specific version number to also enable the TSC_DEADLINE on stepping-3.

The steppings and ucode versions are documented in the SKX BIOS update:
https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txtSigned-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com

616dd587

x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors · cc6afe22

由 Paolo Bonzini 提交于 10月 10, 2017

Commit 594a30fb ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled
due to Errata" on CPUs without the feature", 2017-08-30) was also about
silencing the warning on VirtualBox; however, KVM does expose the TSC
deadline timer, and it's virtualized so that it is immune from CPU errata.

Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs:

     [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                    please update microcode to version: 0xb2 (or later)

Even if you had a hypervisor that does _not_ virtualize the TSC deadline
and rather exposes the hardware one, it should be the hypervisors task
to update microcode and possibly hide the flag from CPUID.  So just
hide the message when running on _any_ hypervisor, not just those that
do not support the TSC deadline timer.

The older check still makes sense, so keep it.

Fixes: bd9240a1 ("x86/apic: Add TSC_DEADLINE quirk due to errata")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com

cc6afe22

powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() · cd4f2b30

由 Anju T Sudhakar 提交于 10月 11, 2017

Stack trace output during a stress test:
 [    4.310049] Freeing initrd memory: 22592K
[    4.310646] rtas_flash: no firmware flash support
[    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
[    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
[    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
[    4.313588] Call Trace:
[    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
[    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
[    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
[    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
[    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
[    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
[    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
[    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
[    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
[    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
[    4.314313] Mem-Info:
[    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0

core_imc_mem_init() at system boot use alloc_pages_node() to get memory
and alloc_pages_node() throws this stack dump when tried to allocate
memory from a node which has no memory behind it. Add a ___GFP_NOWARN
flag in allocation request as a fix.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Reported-by: NVenkat R.B <venkatb3@in.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cd4f2b30

powerpc/perf: Fix for core/nest imc call trace on cpuhotplug · 0d923820

由 Anju T Sudhakar 提交于 10月 04, 2017

Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.

The case where event ref count hit a negative value is, when perf session is
started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.

Add a check to see if the reference count is alreday zero, before decrementing
the count, so that the ref count will not hit a negative value.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: NSantosh Sivaraj <santosh@fossix.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0d923820

KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit · 8eb3f87d

由 Haozhong Zhang 提交于 10月 10, 2017

When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
guest CR4. Before this CR4 loading, the guest CR4 refers to L2
CR4. Because these two CR4's are in different levels of guest, we
should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
is used to handle guest writes to its CR4, checks the guest change to
CR4 and may fail if the change is invalid.

The failure may cause trouble. Consider we start
  a L1 guest with non-zero L1 PCID in use,
     (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
and
  a L2 guest with L2 PCID disabled,
     (i.e. L2 CR4.PCIDE == 0)
and following events may happen:

1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
   into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
   of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
   vcpu->arch.cr4) is left to the value of L2 CR4.

2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
   kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
   because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
   CR3.PCID != 0, L0 KVM will inject GP to L1 guest.

Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
Cc: qemu-stable@nongnu.org
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8eb3f87d

11 10月, 2017 1 次提交

x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c · 67bb8e99

由 Tom Lendacky 提交于 10月 10, 2017

Some routines in mem_encrypt.c are called very early in the boot process,
e.g. sme_enable(). When CONFIG_KCOV=y is defined the resulting code added
to sme_enable() (and others) for KCOV instrumentation results in a kernel
crash. Disable the KCOV instrumentation for mem_encrypt.c by adding
KCOV_INSTRUMENT_mem_encrypt.o := n to arch/x86/mm/Makefile.

In order to avoid other possible early boot issues, model mem_encrypt.c
after head64.c in regards to tools. In addition to disabling KCOV as
stated above and a previous patch that disables branch profiling, also
remove the "-pg" CFLAG if CONFIG_FUNCTION_TRACER is enabled and set
KASAN_SANITIZE to "n", each of which are done on a file basis.
Reported-by: Nkernel test robot <lkp@01.org>
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171010194504.18887.38053.stgit@tlendack-t1.amdoffice.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

67bb8e99

10 10月, 2017 14 次提交

KVM: MMU: always terminate page walks at level 1 · 829ee279

由 Ladi Prosek 提交于 10月 05, 2017

is_last_gpte() is not equivalent to the pseudo-code given in commit
6bb69c9b ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
value of last_nonleaf_level may override the result even if level == 1.

It is critical for is_last_gpte() to return true on level == 1 to
terminate page walks. Otherwise memory corruption may occur as level
is used as an index to various data structures throughout the page
walking code.  Even though the actual bug would be wherever the MMU is
initialized (as in the previous patch), be defensive and ensure here
that is_last_gpte() returns the correct value.

This patch is also enough to fix CVE-2017-12188.

Fixes: 6bb69c9b
Cc: stable@vger.kernel.org
Cc: Andy Honig <ahonig@google.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
[Panic if walk_addr_generic gets an incorrect level; this is a serious
 bug and it's not worth a WARN_ON where the recovery path might hide
 further exploitable issues; suggested by Andrew Honig. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

829ee279

KVM: nVMX: update last_nonleaf_level when initializing nested EPT · fd19d3b4

由 Ladi Prosek 提交于 10月 05, 2017

The function updates context->root_level but didn't call
update_last_nonleaf_level so the previous and potentially wrong value
was used for page walks. For example, a zero value of last_nonleaf_level
would allow a potential out-of-bounds access in arch/x86/mmu/paging_tmpl.h's
walk_addr_generic function (CVE-2017-12188).

Fixes: 155a97a3Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd19d3b4

xen/vcpu: Use a unified name about cpu hotplug state for pv and pvhvm · eac779aa

由 Zhenzhong Duan 提交于 10月 08, 2017

As xen_cpuhp_setup is called by PV and PVHVM, the name of "x86/xen/hvm_guest"
is confusing.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

eac779aa

x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing · ab7ff471

由 Marcelo Henrique Cerri 提交于 10月 05, 2017

Do not consider the fixed size of hv_vp_set when passing the variable
header size to hv_do_rep_hypercall().

The Hyper-V hypervisor specification states that for a hypercall with a
variable header only the size of the variable portion should be supplied
via the input control.

For HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX calls that means the
fixed portion of hv_vp_set should not be considered.

That fixes random failures of some applications that are unexpectedly
killed with SIGBUS or SIGSEGV.
Signed-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Fixes: 628f54cc ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")
Link: http://lkml.kernel.org/r/1507210469-29065-1-git-send-email-marcelo.cerri@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ab7ff471

x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures · 60d73a7c

由 Vitaly Kuznetsov 提交于 10月 05, 2017

hv_do_hypercall() does virt_to_phys() translation and with some configs
(CONFIG_SLAB) this doesn't work for percpu areas, we pass wrong memory to
hypervisor and get #GP. We could use working slow_virt_to_phys() instead
but doing so kills the performance.

Move pcpu_flush/pcpu_flush_ex structures out of percpu areas and
allocate memory on first call. The additional level of indirection gives
us a small performance penalty, in future we may consider introducing
hypercall functions which avoid virt_to_phys() conversion and cache
physical addresses of pcpu_flush/pcpu_flush_ex structures somewhere.
Reported-by: NSimon Xiao <sixiao@microsoft.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20171005113924.28021-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

60d73a7c

x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs · a3b74243

由 Vitaly Kuznetsov 提交于 10月 06, 2017

hv_flush_pcpu_ex structures are not cleared between calls for performance
reasons (they're variable size up to PAGE_SIZE each) but we must clear
hv_vp_set.bank_contents part of it to avoid flushing unneeded vCPUs. The
rest of the structure is formed correctly.

To do the clearing in an efficient way stash the maximum possible vCPU
number (this may differ from Linux CPU id).
Reported-by: NJork Loeser <Jork.Loeser@microsoft.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20171006154854.18092-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a3b74243

perf/x86/intel/uncore: Fix memory leaks on allocation failures · 629eb703

由 Colin Ian King 提交于 10月 09, 2017

Currently if an allocation fails then the error return paths
don't free up any currently allocated pmus[].boxes and pmus causing
a memory leak.  Add an error clean up exit path that frees these
objects.

Detected by CoverityScan, CID#711632 ("Resource Leak")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: 087bfbb0 ("perf/x86: Add generic Intel uncore PMU support")
Link: http://lkml.kernel.org/r/20171009172655.6132-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

629eb703

x86/unwind: Disable unwinder warnings on 32-bit · d4a2d031

由 Josh Poimboeuf 提交于 10月 09, 2017

x86-32 doesn't have stack validation, so in most cases it doesn't make
sense to warn about bad frame pointers.
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a69658760800bf281e6353248c23e0fa0acf5230.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d4a2d031

x86/unwind: Align stack pointer in unwinder dump · 99bd28a4

由 Josh Poimboeuf 提交于 10月 09, 2017

When printing the unwinder dump, the stack pointer could be unaligned,
for one of two reasons:

- stack corruption; or

- GCC created an unaligned stack.

There's no way for the unwinder to tell the difference between the two,
so we have to assume one or the other.  GCC unaligned stacks are very
rare, and have only been spotted before GCC 5.  Presumably, if we're
doing an unwinder stack dump, stack corruption is more likely than a
GCC unaligned stack.  So always align the stack before starting the
dump.
Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

99bd28a4

x86/unwind: Use MSB for frame pointer encoding on 32-bit · 5c99b692

由 Josh Poimboeuf 提交于 10月 09, 2017

On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings
like:

  WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000

And also there were some stack dumps with a bunch of unreliable '?'
symbols after an apic_timer_interrupt symbol, meaning the unwinder got
confused when it tried to read the regs.

The cause of those issues is that, with GCC 4.8 (and possibly older),
there are cases where GCC misaligns the stack pointer in a leaf function
for no apparent reason:

  c124a388 <acpi_rs_move_data>:
  c124a388:       55                      push   %ebp
  c124a389:       89 e5                   mov    %esp,%ebp
  c124a38b:       57                      push   %edi
  c124a38c:       56                      push   %esi
  c124a38d:       89 d6                   mov    %edx,%esi
  c124a38f:       53                      push   %ebx
  c124a390:       31 db                   xor    %ebx,%ebx
  c124a392:       83 ec 03                sub    $0x3,%esp
  ...
  c124a3e3:       83 c4 03                add    $0x3,%esp
  c124a3e6:       5b                      pop    %ebx
  c124a3e7:       5e                      pop    %esi
  c124a3e8:       5f                      pop    %edi
  c124a3e9:       5d                      pop    %ebp
  c124a3ea:       c3                      ret

If an interrupt occurs in such a function, the regs on the stack will be
unaligned, which breaks the frame pointer encoding assumption.  So on
32-bit, use the MSB instead of the LSB to encode the regs.

This isn't an issue on 64-bit, because interrupts align the stack before
writing to it.
Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5c99b692

x86/unwind: Fix dereference of untrusted pointer · 62dd86ac

由 Josh Poimboeuf 提交于 10月 09, 2017

Tetsuo Handa and Fengguang Wu reported a panic in the unwinder:

  BUG: unable to handle kernel NULL pointer dereference at 000001f2
  IP: update_stack_state+0xd4/0x340
  *pde = 00000000

  Oops: 0000 [#1] PREEMPT SMP
  CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be676 #592
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
  task: bb0b53c0 task.stack: bb3ac000
  EIP: update_stack_state+0xd4/0x340
  EFLAGS: 00010002 CPU: 0
  EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570
  ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690
  DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
  DR6: fffe0ff0 DR7: 00000400
  Call Trace:
   ? unwind_next_frame+0xea/0x400
   ? __unwind_start+0xf5/0x180
   ? __save_stack_trace+0x81/0x160
   ? save_stack_trace+0x20/0x30
   ? __lock_acquire+0xfa5/0x12f0
   ? lock_acquire+0x1c2/0x230
   ? tick_periodic+0x3a/0xf0
   ? _raw_spin_lock+0x42/0x50
   ? tick_periodic+0x3a/0xf0
   ? tick_periodic+0x3a/0xf0
   ? debug_smp_processor_id+0x12/0x20
   ? tick_handle_periodic+0x23/0xc0
   ? local_apic_timer_interrupt+0x63/0x70
   ? smp_trace_apic_timer_interrupt+0x235/0x6a0
   ? trace_apic_timer_interrupt+0x37/0x3c
   ? strrchr+0x23/0x50
  Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83
  EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f
  CR2: 00000000000001f2
  ---[ end trace 0d147fd4aba8ff50 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

On x86-32, after decoding a frame pointer to get a regs address,
regs_size() dereferences the regs pointer when it checks regs->cs to see
if the regs are user mode.  This is dangerous because it's possible that
what looks like a decoded frame pointer is actually a corrupt value, and
we don't want the unwinder to make things worse.

Instead of calling regs_size() on an unsafe pointer, just assume they're
kernel regs to start with.  Later, once it's safe to access the regs, we
can do the user mode check and corresponding safety check for the
remaining two regs.
Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5ed8d8bb ("x86/unwind: Move common code into update_stack_state()")
Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

62dd86ac

powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology() · 6b2c08f9

由 Thiago Jung Bauermann 提交于 10月 04, 2017

It turns out that not all paths calling arch_update_cpu_topology() hold
cpu_hotplug_lock, but that's OK because those paths can't race with
any concurrent hotplug events.

Warnings were reported with the following trace:

  lockdep_assert_cpus_held
  arch_update_cpu_topology
  sched_init_domains
  sched_init_smp
  kernel_init_freeable
  kernel_init
  ret_from_kernel_thread

Which is safe because it's called early in boot when hotplug is not
live yet.

And also this trace:

  lockdep_assert_cpus_held
  arch_update_cpu_topology
  partition_sched_domains
  cpuset_update_active_cpus
  sched_cpu_deactivate
  cpuhp_invoke_callback
  cpuhp_down_callbacks
  cpuhp_thread_fun
  smpboot_thread_fn
  kthread
  ret_from_kernel_thread

Which is safe because it's called as part of CPU hotplug, so although
we don't hold the CPU hotplug lock, there is another thread driving
the CPU hotplug operation which does hold the lock, and there is no
race.

Thanks to tglx for deciphering it for us.

Fixes: 3e401f7a ("powerpc: Only obtain cpu_hotplug_lock if called by rtasd")
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6b2c08f9

powerpc/lib/sstep: Fix count leading zeros instructions · b0490a04

由 Sandipan Das 提交于 10月 10, 2017

According to the GCC documentation, the behaviour of __builtin_clz()
and __builtin_clzl() is undefined if the value of the input argument
is zero. Without handling this special case, these builtins have been
used for emulating the following instructions:
  * Count Leading Zeros Word (cntlzw[.])
  * Count Leading Zeros Doubleword (cntlzd[.])

This fixes the emulated behaviour of these instructions by adding an
additional check for this special case.

Fixes: 3cdfcbfd ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b0490a04

powerpc/livepatch: Fix livepatch stack access · e36a82ee

由 Kamalesh Babulal 提交于 9月 20, 2017

While running stress test with livepatch module loaded, kernel bug was
triggered.

  cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60]
  5:mon> t
  [c0000000eb9d3de0] c0000000eb9d3e30 (unreliable)
  [c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120
   --- Exception: 501 (Hardware Interrupt) at c000000000053040 livepatch_handler+0x4c/0x74
  [c0000000eb9d4120] 0000000057ac6e9d (unreliable)
  [d0000000089d9f78] 2e0965747962382e
  SP (965747962342e09) is in userspace

When an interrupt occurs during the livepatch_handler execution, it's
possible for the livepatch_stack and/or thread_info to be corrupted.
eg:

  Task A                        Interrupt Handler
  =========                     =================
  livepatch_handler:
  mr r0, r1
  ld r1, TI_livepatch_sp(r12)
                                hardware_interrupt_common:
                                  do_IRQ+0x8:
                                    mflr    r0          <- saved stack pointer is overwritten
                                    bl      _mcount
                                    ...
                                    std     r27,-40(r1) <- overwrite of thread_info()

  lis r2, STACK_END_MAGIC@h
  ori r2, r2, STACK_END_MAGIC@l
  ld  r12, -8(r1)

Fix the corruption by using r11 register for livepatch stack
manipulation, instead of shuffling task stack and livepatch stack into
r1 register. Using r11 register also avoids disabling/enabling irq's
while setting up the livepatch stack.
Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e36a82ee

09 10月, 2017 8 次提交

Revert commit ("net:add one common config...") · f4986d25

由 Ding Tianhong 提交于 8月 18, 2017

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76 ("net:add one common config...") did,
so revert this commit.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f4986d25

MIPS: math-emu: Remove pr_err() calls from fpu_emu() · ca8eb05b

由 Paul Burton 提交于 9月 08, 2017

The FPU emulator includes 2 calls to pr_err() which are triggered by
invalid instruction encodings for MIPSr6 cmp.cond.fmt instructions.
These cases are not kernel errors, merely invalid instructions which are
already handled by delivering a SIGILL which will provide notification
that something failed in cases where that makes sense.

In cases where that SIGILL is somewhat expected & being handled, for
example when crashme happens to generate one of the affected bad
encodings, the message is printed with no useful context about what
triggered it & spams the kernel log for no good reason.

Remove the pr_err() calls to make crashme run silently & treat the bad
encodings the same way we do others, with a SIGILL & no further kernel
log output.
Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
Fixes: f8c3c671 ("MIPS: math-emu: Add support for the CMP.condn.fmt R6 instruction")
Cc: linux-mips@linux-mips.org
Cc: stable <stable@vger.kernel.org> # v4.3+
Patchwork: https://patchwork.linux-mips.org/patch/17253/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

ca8eb05b

MIPS: Fix generic-board-config.sh for builds using O= · e1270575

由 Paul Burton 提交于 9月 03, 2017

When configuring the kernel using one of the generic MIPS defconfig
targets, the generic-board-config.sh script is used to check
requirements listed in board config fragments against a reference config
in order to determine which board config fragments to merge into the
final config.

When specifying O= to configure in a directory other than the kernel
source directory, this generic-board-config.sh script is invoked in the
directory that we are configuring in (ie. the directory that O equals),
and the path to the reference config is relative to the current
directory. The script then changes the current directory to the source
tree, which unfortunately breaks later access to the reference file
since its path is relative to a directory that is no longer the current
working directory. This results in configuration failing with errors
such as:

  $ make ARCH=mips O=tmp 32r2_defconfig
  make[1]: Entering directory '/home/pburton/src/linux/tmp'
  Using ../arch/mips/configs/generic_defconfig as base
  Merging ../arch/mips/configs/generic/32r2.config
  Merging ../arch/mips/configs/generic/eb.config
  grep: ./.config.32r2_defconfig: No such file or directory
  grep: ./.config.32r2_defconfig: No such file or directory
  The base file '.config' does not exist.  Exit.
  make[1]: *** [arch/mips/Makefile:505: 32r2_defconfig] Error 1
  make[1]: Leaving directory '/home/pburton/src/linux-ingenic/tmp'
  make: *** [Makefile:145: sub-make] Error 2

Fix this by avoiding changing the working directory in
generic-board-config.sh, instead using full paths to files under
$(srctree)/ where necessary.
Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
Fixes: 27e0d4b0 ("MIPS: generic: Allow filtering enabled boards by requirements")
Cc: linux-mips@linux-mips.org
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Patchwork: https://patchwork.linux-mips.org/patch/17231/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

e1270575

MIPS: Fix cmpxchg on 32b signed ints for 64b kernel with !kernel_uses_llsc · 133d68e0

由 Paul Burton 提交于 9月 01, 2017

Commit 8263db4d ("MIPS: cmpxchg: Implement __cmpxchg() as a
function") refactored our implementation of __cmpxchg() to be a function
rather than a macro, with the aim of making it easier to read & modify.
Unfortunately the commit breaks use of cmpxchg() for signed 32 bit
values when we have a 64 bit kernel with kernel_uses_llsc == false,
because:

 - In cmpxchg_local() we cast the old value to the type the pointer
   points to, and then to an unsigned long. If the pointer points to a
   signed type smaller than 64 bits then the old value will be sign
   extended to 64 bits. That is, bits beyond the size of the pointed to
   type will be set to 1 if the old value is negative. In the case of a
   signed 32 bit integer with a negative value, bits 63:32 will all be
   set.

 - In __cmpxchg_asm() we load the value from memory, ie. dereference the
   pointer, and store the value as an unsigned integer (__ret) whose
   size matches the pointer. For a 32 bit cmpxchg() this means we store
   the value in a u32, because the pointer provided to __cmpxchg_asm()
   by __cmpxchg() is of type volatile u32 *.

 - __cmpxchg_asm() then checks whether the value in memory (__ret)
   matches the provided old value, by comparing the two values. This
   results in the u32 being promoted to a 64 bit unsigned long to match
   the old argument - however because both types are unsigned the value
   is zero extended, which does not match the sign extension performed
   on the old value in cmpxchg_local() earlier.

This mismatch means that unfortunate cmpxchg() calls can incorrectly
fail for 64 bit kernels with kernel_uses_llsc == false. This is the case
on at least non-SMP Cavium Octeon kernels, which hardcode
kernel_uses_llsc in their cpu-feature-overrides.h header. Using a
v4.13-rc7 kernel configured using cavium_octeon_defconfig with SMP
manually disabled, this presents itself as oddity when we reach
userland - for example:

  can't run '/bin/mount': Text file busy
  can't run '/bin/mkdir': Text file busy
  can't run '/bin/mkdir': Text file busy
  can't run '/bin/mount': Text file busy
  can't run '/bin/hostname': Text file busy
  can't run '/etc/init.d/rcS': Text file busy
  can't run '/sbin/getty': Text file busy
  can't run '/sbin/getty': Text file busy

It appears that some part of the init process, which is in this case
buildroot's busybox init, is running successfully. It never manages to
reach the login prompt though, and complains about /sbin/getty being
busy repeatedly and indefinitely.

Fix this by casting the old value provided to __cmpxchg_asm() to an
appropriately sized unsigned integer, such that we consistently
zero-extend avoiding the mismatch. The __cmpxchg_small() case for 8 & 16
bit values is unaffected because __cmpxchg_small() already masks
provided values appropriately.
Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
Fixes: 8263db4d ("MIPS: cmpxchg: Implement __cmpxchg() as a function")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17226/
Cc: linux-mips@linux-mips.org
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

133d68e0

MIPS: loongson1: set default number of rx and tx queues for stmmac · 1b6ad6df

由 Kelvin Cheung 提交于 10月 06, 2017

Set the default number of RX and TX queues due to
the recent changes of stmmac driver.
Otherwise the ethernet will crash once it starts.
Signed-off-by: NKelvin Cheung <keguang.zhang@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17452/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

1b6ad6df

MIPS: bpf: Fix uninitialised target compiler error · 94c3390a

由 Matt Redfearn 提交于 9月 27, 2017

Compiling ebpf_jit.c with gcc 4.9 results in a (likely spurious)
compiler warning, as gcc has detected that the variable "target" may be
used uninitialised. Since -Werror is active, this is treated as an error
and causes a kernel build failure whenever CONFIG_MIPS_EBPF_JIT is
enabled.

arch/mips/net/ebpf_jit.c: In function 'build_one_insn':
arch/mips/net/ebpf_jit.c:1118:80: error: 'target' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
    emit_instr(ctx, j, target);
                                                                                ^
cc1: all warnings being treated as errors

Fix this by initialising "target" to 0. If it really is used
uninitialised this would result in a jump to 0 and a detectable run time
failure.
Signed-off-by: NMatt Redfearn <matt.redfearn@imgtec.com>
Fixes: b6bd53f9 ("MIPS: Add missing file for eBPF JIT.")
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v4.13+
Patchwork: https://patchwork.linux-mips.org/patch/17375/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

94c3390a

x86/alternatives: Fix alt_max_short macro to really be a max() · 6b32c126

由 Mathias Krause 提交于 10月 05, 2017

The alt_max_short() macro in asm/alternative.h does not work as
intended, leading to nasty bugs. E.g. alt_max_short("1", "3")
evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not
exactly the maximum of 1 and 3.

In fact, I had to learn it the hard way by crashing my kernel in not
so funny ways by attempting to make use of the ALTENATIVE_2 macro
with alternatives where the first one was larger than the second
one.

According to [1] and commit dbe4058a ("x86/alternatives: Fix
ALTERNATIVE_2 padding generation properly") the right handed side
should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the
macro work as intended.

While at it, fix up the comments regarding the additional "-", too.
It's not about gas' usage of s32 but brain dead logic of having a
"true" value of -1 for the < operator ... *sigh*

Btw., the one in asm/alternative-asm.h is correct. And, apparently,
all current users of ALTERNATIVE_2() pass same sized alternatives,
avoiding to hit the bug.

[1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMaxReviewed-and-tested-by: NBorislav Petkov <bp@suse.de>
Fixes: dbe4058a ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly")
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1507228213-13095-1-git-send-email-minipli@googlemail.com

6b32c126

x86/mm/64: Fix reboot interaction with CR4.PCIDE · 924c6b90

由 Andy Lutomirski 提交于 10月 08, 2017

Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set.  (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.)  The result is a GPF and
a hang instead of a reboot.

I didn't catch this in testing because neither my computer nor my VM
reboots this way.  I can trigger it with reboot=bios, though.

Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org

924c6b90

06 10月, 2017 4 次提交

ARC: [plat-hsdk]: Add reset controller node to manage ethernet reset · ab8eb7db

由 Eugeniy Paltsev 提交于 9月 22, 2017

DW ethernet controller on HSDK hangs sometimes after SW reset, so
add reset node to make possible to reset DW ethernet controller HW.
Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

ab8eb7db

arm64: Ensure fpsimd support is ready before userspace is active · ae2e972d

由 Suzuki K Poulose 提交于 10月 06, 2017

We register the pm/hotplug callbacks for FPSIMD as late_initcall,
which happens after the userspace is active (from initramfs via
populate_rootfs, a rootfs_initcall). Make sure we are ready even
before the userspace could potentially use it, by promoting to
a core_initcall.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

ae2e972d

arm64: Ensure the instruction emulation is ready for userspace · c0d8832e

由 Suzuki K Poulose 提交于 10月 06, 2017

We trap and emulate some instructions (e.g, mrs, deprecated instructions)
for the userspace. However the handlers for these are registered as
late_initcalls and the userspace could be up and running from the initramfs
by that time (with populate_rootfs, which is a rootfs_initcall()). This
could cause problems for the early applications ending up in failure
like :

[   11.152061] modprobe[93]: undefined instruction: pc=0000ffff8ca48ff4

This patch promotes the specific calls to core_initcalls, which are
guaranteed to be completed before we hit userspace.

Cc: stable@vger.kernel.org
Cc: Dave Martin <dave.martin@arm.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Cc: James Morse <james.morse@arm.com>
Reported-by: NMatwey V. Kornilov <matwey.kornilov@gmail.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c0d8832e

powerpc/tm: Fix illegal TM state in signal handler · 044215d1

由 Gustavo Romero 提交于 8月 22, 2017

Currently it's possible that on returning from the signal handler
through the restore_tm_sigcontexts() code path (e.g. from a signal
caught due to a `trap` instruction executed in the middle of an HTM
block, or a deliberately constructed sigframe) an illegal TM state
(like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
implicitly the MSR register from SRR1 register on return to userspace
it causes a TM Bad Thing exception.

That illegal state can be set (a) by a malicious user that disables
the TM bit by tweaking the bits in uc_mcontext before returning from
the signal handler or (b) by a sufficient number of context switches
occurring such that the load_tm counter overflows and TM is disabled
whilst in the signal handler.

This commit fixes the illegal TM state by ensuring that TM bit is
always enabled before we return from restore_tm_sigcontexts(). A small
comment correction is made as well.

Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: NBreno Leitao <leitao@debian.org>
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

044215d1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功