提交 · 5e2f30b756a37bd80c5b0471d0e10d769ab2eb9a · openanolis / cloud-kernel

07 8月, 2017 3 次提交

KVM: nVMX: get rid of nested_get_page() · 5e2f30b7

由 David Hildenbrand 提交于 8月 03, 2017

nested_get_page() just sounds confusing. All we want is a page from G1.
This is even unrelated to nested.

Let's introduce kvm_vcpu_gpa_to_page() so we don't get too lengthy
lines.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
[Squash pasto fix from Wanpeng Li. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5e2f30b7

KVM: nVMX: INVPCID support · 90a2db6d

由 Paolo Bonzini 提交于 7月 27, 2017

Expose the "Enable INVPCID" secondary execution control to the guest
and properly reflect the exit reason.

In addition, before this patch the guest was always running with
INVPCID enabled, causing pcid.flat's "Test on INVPCID when disabled"
test to fail.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

90a2db6d

KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY and HV_X64_MSR_APIC_FREQUENCY · 72c139ba

由 Ladi Prosek 提交于 7月 26, 2017

It has been experimentally confirmed that supporting these two MSRs is one
of the necessary conditions for nested Hyper-V to use the TSC page. Modern
Windows guests are noticeably slower when they fall back to reading
timestamps from the HV_X64_MSR_TIME_REF_COUNT MSR instead of using the TSC
page.

The newly supported MSRs are advertised with the AccessFrequencyRegs
partition privilege flag and CPUID.40000003H:EDX[8] "Support for
determining timer frequencies is available" (both outside of the scope of
this KVM patch).
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72c139ba

05 8月, 2017 1 次提交

sparc64: Fix exception handling in UltraSPARC-III memcpy. · 0ede1c40

由 David S. Miller 提交于 8月 04, 2017

Mikael Pettersson reported that some test programs in the strace-4.18
testsuite cause an OOPS.

After some debugging it turns out that garbage values are returned
when an exception occurs, causing the fixup memset() to be run with
bogus arguments.

The problem is that two of the exception handler stubs write the
successfully copied length into the wrong register.

Fixes: ee841d0a ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Reported-by: NMikael Pettersson <mikpelinux@gmail.com>
Tested-by: NMikael Pettersson <mikpelinux@gmail.com>
Reviewed-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ede1c40

04 8月, 2017 5 次提交

arm64: avoid overflow in VA_START and PAGE_OFFSET · 82cd5880

由 Nick Desaulniers 提交于 8月 03, 2017

The bitmask used to define these values produces overflow, as seen by
this compiler warning:

arch/arm64/kernel/head.S:47:8: warning:
      integer overflow in preprocessor expression
  #elif (PAGE_OFFSET & 0x1fffff) != 0
         ^~~~~~~~~~~
arch/arm64/include/asm/memory.h:52:46: note:
      expanded from macro 'PAGE_OFFSET'
  #define PAGE_OFFSET             (UL(0xffffffffffffffff) << (VA_BITS -
1))
                                      ~~~~~~~~~~~~~~~~~~  ^

It would be preferrable to use GENMASK_ULL() instead, but it's not set
up to be used from assembly (the UL() macro token pastes UL suffixes
when not included in assembly sources).
Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: NYury Norov <ynorov@caviumnetworks.com>
Suggested-by: NMatthias Kaehlcke <mka@chromium.org>
Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

82cd5880

arm64: Fix potential race with hardware DBM in ptep_set_access_flags() · 6d332747

由 Catalin Marinas 提交于 7月 25, 2017

In a system with DBM (dirty bit management) capable agents there is a
possible race between a CPU executing ptep_set_access_flags() (maybe
non-DBM capable) and a hardware update of the dirty state (clearing of
PTE_RDONLY). The scenario:

a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old
   (PTE_AF clear)
b) ptep_set_access_flags() is called as a result of a read access and it
   needs to set the pte to writable, clean and young (PTE_AF set)
c) a DBM-capable agent, as a result of a different write access, is
   marking the entry as young (setting PTE_AF) and dirty (clearing
   PTE_RDONLY)

The current ptep_set_access_flags() implementation would set the
PTE_RDONLY bit in the resulting value overriding the DBM update and
losing the dirty state.

This patch fixes such race by setting PTE_RDONLY to the most permissive
(lowest value) of the current entry and the new one.

Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NSteve Capper <steve.capper@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

6d332747

ARM: dts: tango4: Request RGMII RX and TX clock delays · 985333b0

由 Marc Gonzalez 提交于 7月 28, 2017

RX and TX clock delays are required. Request them explicitly.

Fixes: cad008b8 ("ARM: dts: tango4: Initial device trees")
Cc: stable@vger.kernel.org
Signed-off-by: NMarc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

985333b0

powerpc/64: Fix __check_irq_replay missing decrementer interrupt · 3db40c31

由 Nicholas Piggin 提交于 8月 01, 2017

If the decrementer wraps again and de-asserts the decrementer
exception while hard-disabled, __check_irq_replay() has a test to
notice the wrap when interrupts are re-enabled.

The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
because the decrementer interrupt was always the first one checked
after clearing the hard disable flag, but HMI check was moved ahead of
that, which introduced this bug.

This can cause a missed decrementer interrupt if we soft-disable
interrupts then take an HMI which is recorded in irq_happened, then
hard-disable interrupts for > 4s to wrap the decrementer.

Fixes: e0e0d6b7 ("powerpc/64: Replay hypervisor maintenance interrupt first")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3db40c31

powerpc/perf: POWER9 PMU stops after idle workaround · 09539f9b

由 Nicholas Piggin 提交于 7月 20, 2017

POWER9 DD2 PMU can stop after a state-loss idle in some conditions.

A solution is to set then clear MMCRA[60] after wake from state-loss
idle. MMCRA[60] is a non-architected bit, see the user manual for
details.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

09539f9b

03 8月, 2017 6 次提交

KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" · 6550c4df

由 Wanpeng Li 提交于 7月 31, 2017

------------[ cut here ]------------
 WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
 CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
Call Trace:
  vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
  ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x8f/0x750
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
means that tells the kernel to not make use of any IOAPICs that may be present
in the system.

Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
variable passed from vcpu_enter_guest() which means that the L0's userspace
requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
L0's userspace reqeusts an irq window) is true, so there is no interrupt which
L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
interrupt on exit" for the irq window requirement in this scenario.

This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
if there is no L1 requirement to inject an interrupt to L2.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Added code comment to make it obvious that the behavior is not correct.
 We should do a userspace exit with open interrupt window instead of the
 nested VM exit.  This patch still improves the behavior, so it was
 accepted as a (temporary) workaround.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

6550c4df

KVM: nVMX: mark vmcs12 pages dirty on L2 exit · c9f04407

由 David Matlack 提交于 8月 01, 2017

The host physical addresses of L1's Virtual APIC Page and Posted
Interrupt descriptor are loaded into the VMCS02. The CPU may write
to these pages via their host physical address while L2 is running,
bypassing address-translation-based dirty tracking (e.g. EPT write
protection). Mark them dirty on every exit from L2 to prevent them
from getting out of sync with dirty tracking.

Also mark the virtual APIC page and the posted interrupt descriptor
dirty when KVM is virtualizing posted interrupt processing.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c9f04407

kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown · 8ca44e88

由 David Matlack 提交于 8月 01, 2017

According to the Intel SDM, software cannot rely on the current VMCS to be
coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
flushes.

24.11.1 Software Use of Virtual-Machine Control Structures
...
  If a logical processor leaves VMX operation, any VMCSs active on
  that logical processor may be corrupted (see below). To prevent
  such corruption of a VMCS that may be used either after a return
  to VMX operation or on another logical processor, software should
  execute VMCLEAR for that VMCS before executing the VMXOFF instruction
  or removing power from the processor (e.g., as part of a transition
  to the S3 and S4 power states).
...

This fixes a "suspicious rcu_dereference_check() usage!" warning during
kvm_vm_release() because nested_release_vmcs12() calls
kvm_vcpu_write_guest_page() without holding kvm->srcu.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

8ca44e88

KVM: nVMX: do not pin the VMCS12 · 9f744c59

由 Paolo Bonzini 提交于 7月 27, 2017

Since the current implementation of VMCS12 does a memcpy in and out
of guest memory, we do not need current_vmcs12 and current_vmcs12_page
anymore.  current_vmptr is enough to read and write the VMCS12.

And David Matlack noted:

  This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
  VMCS12 page by using kvm_write_guest. nested_release_page() only marks
  the struct page dirty.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
[Added David Matlack's note and nested_release_page_clean() fix.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9f744c59

KVM: X86: init irq->level in kvm_pv_kick_cpu_op · ebd28fcb

由 Longpeng(Mike) 提交于 8月 02, 2017

'lapic_irq' is a local variable and its 'level' field isn't
initialized, so 'level' is random, it doesn't matter but
makes UBSAN unhappy:

UBSAN: Undefined behaviour in .../lapic.c:...
load of value 10 is not a valid value for type '_Bool'
...
Call Trace:
 [<ffffffff81f030b6>] dump_stack+0x1e/0x20
 [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55
 [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162
 [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm]
 [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm]
 [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm]
 [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm]
 [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel]
 [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel]
...
Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

ebd28fcb

KVM: X86: Fix loss of pending INIT due to race · f4ef1910

由 Wanpeng Li 提交于 8月 01, 2017

When SMP VM start, AP may lost INIT because of receiving INIT between
kvm_vcpu_ioctl_x86_get/set_vcpu_events.

       vcpu 0                             vcpu 1
                                   kvm_vcpu_ioctl_x86_get_vcpu_events
                                     events->smi.latched_init = 0
  send INIT to vcpu1
    set vcpu1's pending_events
                                   kvm_vcpu_ioctl_x86_set_vcpu_events
                                      if (events->smi.latched_init == 0)
                                        clear INIT in pending_events

This patch fixes it by just update SMM related flags if we are in SMM.

Thanks Peng Hao for the report and original commit message.
Reported-by: NPeng Hao <peng.hao2@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f4ef1910

02 8月, 2017 5 次提交

ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge · d7a65c49

由 Gregory CLEMENT 提交于 8月 01, 2017

The number of pins in South Bridge is 30 and not 29. There is a fix for
the driver for the pinctrl, but a fix is also need at device tree level
for the GPIO.

Fixes: afda007f ("ARM64: dts: marvell: Add pinctrl nodes for Armada
3700")
Cc: <stable@vger.kernel.org>
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>

d7a65c49

powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check · f29bb786

由 Sergei Shtylyov 提交于 7月 29, 2017

of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Freescale MPC832x RDB board
code still only regards 0 as a failure indication -- fix it up.

Fixes: 7a4228bb ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: NScott Wood <oss@buserror.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f29bb786

KVM: async_pf: make rcu irq exit if not triggered from idle task · 337c017c

由 Wanpeng Li 提交于 8月 01, 2017

 WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1
 RIP: 0010:rcu_note_context_switch+0x207/0x6b0
 Call Trace:
  __schedule+0xda/0xba0
  ? kvm_async_pf_task_wait+0x1b2/0x270
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30
 RIP: 0010:__d_lookup_rcu+0x90/0x1e0

I encounter this when trying to stress the async page fault in L1 guest w/
L2 guests running.

Commit 9b132fbe (Add rcu user eqs exception hooks for async page
fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
idle eqs when needed, to protect the code that needs use rcu.  However,
we need to call the pair even if the function calls schedule(), as seen
from the above backtrace.

This patch fixes it by informing the RCU subsystem exit/enter the irq
towards/away from idle for both n.halted and !n.halted.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

337c017c

KVM: nVMX: fixes to nested virt interrupt injection · b96fb439

由 Paolo Bonzini 提交于 7月 27, 2017

There are three issues in nested_vmx_check_exception:

1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
by Wanpeng Li;

2) it should rebuild the interruption info and exit qualification fields
from scratch, as reported by Jim Mattson, because the values from the
L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
a page fault, the EPT misconfig's exit qualification is incorrect).

3) CR2 and DR6 should not be written for exception intercept vmexits
(CR2 only for AMD).

This patch fixes the first two and adds a comment about the last,
outlining the fix.

Cc: Jim Mattson <jmattson@google.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b96fb439

KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 · 7313c698

由 Paolo Bonzini 提交于 7月 27, 2017

Do this in the caller of nested_vmx_vmexit instead.

nested_vmx_check_exception was doing a vmwrite to the vmcs02's
VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
the field to vmcs12->vm_exit_intr_error_code.  However that isn't
possible on pre-Haswell machines.  Moving the vmcs12 write to the
callers fixes it.
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
 thanks to fengguang.wu@intel.com]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

7313c698

01 8月, 2017 2 次提交

arm64: Use arch_timer_get_rate when trapping CNTFRQ_EL0 · c6f97add

由 Marc Zyngier 提交于 7月 21, 2017

In an ideal world, CNTFRQ_EL0 always contains the timer frequency
for the kernel to use. Sadly, we get quite a few broken systems
where the firmware authors cannot be bothered to program that
register on all CPUs, and rely on DT to provide that frequency.

So when trapping CNTFRQ_EL0, make sure to return the actual rate
(as known by the kernel), and not CNTFRQ_EL0.
Acked-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c6f97add

x86/hpet: Cure interface abuse in the resume path · bb68cfe2

由 Thomas Gleixner 提交于 7月 31, 2017

The HPET resume path abuses irq_domain_[de]activate_irq() to restore the
MSI message in the HPET chip for the boot CPU on resume and it relies on an
implementation detail of the interrupt core code, which magically makes the
HPET unmask call invoked via a irq_disable/enable pair. This worked as long
as the irq code did unconditionally invoke the unmask() callback. With the
recent changes which keep track of the masked state to avoid expensive
hardware access, this does not longer work. As a consequence the HPET timer
interrupts are not unmasked which breaks resume as the boot CPU waits
forever that a timer interrupt arrives.

Make the restore of the MSI message explicit and invoke the unmask()
function directly. While at it get rid of the pointless affinity setting as
nothing can change the affinity of the interrupt and the vector across
suspend/resume. The restore of the MSI message reestablishes the previous
affinity setting which is the correct one.

Fixes: bf22ff45 ("genirq: Avoid unnecessary low level irq function calls")
Reported-and-tested-by: NTomi Sarvela <tomi.p.sarvela@intel.com>
Reported-by: NMartin Peres <martin.peres@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: N"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: jeffy.chen@rock-chips.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos

bb68cfe2

31 7月, 2017 4 次提交

parisc: Define CONFIG_CPU_BIG_ENDIAN · 74ad3d28

由 Babu Moger 提交于 7月 06, 2017

While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
to clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
 {
	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
 }

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for parisc architecture to fix it.
Signed-off-by: NBabu Moger <babu.moger@oracle.com>
Signed-off-by: NHelge Deller <deller@gmx.de>

74ad3d28

powerpc/64s: Fix stack setup in watchdog soft_nmi_common() · cc491f1d

由 Nicholas Piggin 提交于 7月 29, 2017

The watchdog soft-NMI exception stack setup loads a stack pointer
twice, which is an obvious error. It ends up using the system reset
interrupt (true-NMI) stack, which is also a bug because the watchdog
could be preempted by a system reset interrupt that overwrites the
NMI stack.

Change the soft-NMI to use the "emergency stack". The current kernel
stack is not used, because of the longer-term goal to prevent
asynchronous stack access using soft-disable.

Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cc491f1d

parisc: Increase thread and stack size to 32kb · 8f8201df

由 Helge Deller 提交于 7月 31, 2017

Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
the default size of 16k. The reason why stack usage suddenly grew is yet
unknown.
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: NHelge Deller <deller@gmx.de>

8f8201df

parisc: Handle vma's whose context is not current in flush_cache_range · 13d57093

由 John David Anglin 提交于 7月 30, 2017

In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
statement in flush_cache_range() during a system shutdown:

kernel BUG at arch/parisc/kernel/cache.c:595!
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx

 IAOQ[0]: flush_cache_range+0x144/0x148
 IAOQ[1]: flush_cache_page+0x0/0x1a8
 RP(r2): flush_cache_range+0xec/0x148
Backtrace:
 [<00000000402910ac>] unmap_page_range+0x84/0x880
 [<00000000402918f4>] unmap_single_vma+0x4c/0x60
 [<0000000040291a18>] zap_page_range_single+0x110/0x160
 [<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
 [<000000004026ccd8>] truncate_pagecache+0x50/0xa8
 [<000000004026cd84>] truncate_setsize+0x54/0x70
 [<000000004033d534>] put_aio_ring_file+0x44/0xb0
 [<000000004033d5d8>] aio_free_ring+0x38/0x140
 [<000000004033d714>] free_ioctx+0x34/0xa8
 [<00000000401b0028>] process_one_work+0x1b8/0x4d0
 [<00000000401b04f4>] worker_thread+0x1b4/0x648
 [<00000000401b9128>] kthread+0x1b0/0x208
 [<0000000040150020>] end_fault_vector+0x20/0x28
 [<0000000040639518>] nf_ip_reroute+0x50/0xa8
 [<0000000040638ed0>] nf_ip_route+0x10/0x78
 [<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8

CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
Backtrace:
 [<0000000040163bf0>] show_stack+0x20/0x38
 [<0000000040688480>] dump_stack+0xa8/0x120
 [<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
 [<0000000040164d0c>] handle_interruption+0xa24/0xa48

This patch modifies flush_cache_range() to handle non current contexts.
In as much as this occurs infrequently, the simplest approach is to
flush the entire cache when this happens.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: NHelge Deller <deller@gmx.de>

13d57093

30 7月, 2017 1 次提交

cpufreq: x86: Make scaling_cur_freq behave more as expected · 4815d3c5

由 Rafael J. Wysocki 提交于 7月 28, 2017

After commit f8475cef "x86: use common aperfmperf_khz_on_cpu() to
calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
in sysfs only behaves as expected on x86 with APERF/MPERF registers
available when it is read from at least twice in a row. The value
returned by the first read may not be meaningful, because the
computations in there use cached values from the previous iteration
of aperfmperf_snapshot_khz() which may be stale.

To prevent that from happening, modify arch_freq_get_on_cpu() to
call aperfmperf_snapshot_khz() twice, with a short delay between
these calls, if the previous invocation of aperfmperf_snapshot_khz()
was too far back in the past (specifically, more that 1s ago).

Also, as pointed out by Doug Smythies, aperf_delta is limited now
and the multiplication of it by cpu_khz won't overflow, so simplify
the s->khz computations too.

Fixes: f8475cef "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF"
Reported-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4815d3c5

29 7月, 2017 1 次提交

ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk · fce8dc5e

由 Geert Uytterhoeven 提交于 7月 12, 2017

Simon Horman reported that Koelsch and Lager hang during boot, and
bisected this to commit 1c3c5eab ("sched/core: Enable
might_sleep() and smp_processor_id() checks early").

The da9063/da9210 regulator quirk for R-Car Gen2 boards uses a bus
notifier, and unregisters the notifier when it is no longer needed.
However, a notifier must not be unregistered from within the call chain.

This bug went unnoticed, as blocking_notifier_chain_unregister() didn't
take the semaphore during early boot.  The aforementioned commit changed
that behavior, leading to a deadlock.

Fix this by removing the call to bus_unregister_notifier(), and keeping
local completion state instead.
Reported-by: NSimon Horman <horms+renesas@verge.net.au>
Fixes: 663fbb52 ("ARM: shmobile: R-Car Gen2: Add da9063/da9210 regulator quirk")
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>

fce8dc5e

28 7月, 2017 12 次提交

powerpc/powernv/pci: Return failure for some uses of dma_set_mask() · 253fd51e

由 Alistair Popple 提交于 7月 26, 2017

Commit 8e3f1b1d ("powerpc/powernv/pci: Enable 64-bit devices to access
>4GB DMA space") introduced the ability for PCI device drivers to request a
DMA mask between 64 and 32 bits and actually get a mask greater than
32-bits. However currently if certain machine configuration dependent
conditions are not meet the code silently falls back to a 32-bit mask.

This makes it hard for device drivers to detect which mask they actually
got. Instead we should return an error when the request could not be
fulfilled which allows drivers to either fallback or implement other
workarounds as documented in DMA-API-HOWTO.txt.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Acked-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

253fd51e

powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler · 65c5ec11

由 Michael Ellerman 提交于 7月 26, 2017

Historically the boot wrapper was always built 32-bit big endian, even
for 64-bit kernels. That was because old firmwares didn't necessarily
support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
uses CROSS32CC for compilation.

However when we added 64-bit little endian support, we also added
support for building the boot wrapper 64-bit. However we kept using
CROSS32CC, because in most cases it is just CC and everything works.

However if the user doesn't specify CROSS32_COMPILE (which no one ever
does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
becomes just "gcc". On native systems that is probably OK, but if we're
cross building it definitely isn't, leading to eg:

gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
gcc: error: unrecognized argument in option ‘-mabi=elfv2’
gcc: error: unrecognized command line option ‘-mlittle-endian’
make: *** [zImage] Error 2

To fix it, stop using CROSS32CC, because we may or may not be building
32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
CROSS32_COMPILE if it's set and we're building for 32-bit.

Fixes: 147c0516 ("powerpc/boot: Add support for 64bit little endian wrapper")
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NCyril Bur <cyrilbur@gmail.com>

65c5ec11

powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU · 7b7622bb

由 Michael Ellerman 提交于 7月 27, 2017

In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
CPU, which means it has to run *on* the boot CPU.

In the past we ensured it ran on the boot CPU by changing the CPU
affinity mask of current directly. That was removed in commit
6d11b87d ("powerpc/smp: Replace open coded task affinity logic"),
and replaced with a work queue call.

Unfortunately using a work queue leads to a lockdep warning, now that
the CPU hotplug lock is a regular semaphore:

  ======================================================
  WARNING: possible circular locking dependency detected
  ...
  kworker/0:1/971 is trying to acquire lock:
   (cpu_hotplug_lock.rw_sem){++++++}, at: [<c000000000100974>] apply_workqueue_attrs+0x34/0xa0

  but task is already holding lock:
   ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
  ...
       CPU0                    CPU1
       ----                    ----
  lock((&wfc.work));
                               lock(cpu_hotplug_lock.rw_sem);
                               lock((&wfc.work));
  lock(cpu_hotplug_lock.rw_sem);

Although the deadlock can't happen in practice, because
smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
lockdep can't tell that.

Luckily in commit 8fb12156 ("init: Pin init task to the boot CPU,
initially") tglx changed the generic code to pin init to the boot CPU
to begin with. The unpinning of init from the boot CPU happens in
sched_init_smp(), which is called after smp_cpus_done().

So smp_cpus_done() is always called on the boot CPU, which means we
don't need the work queue call at all - and the lockdep warning goes
away.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

7b7622bb

arm64: mmu: Place guard page after mapping of kernel image · 92bbd16e

由 Will Deacon 提交于 7月 24, 2017

The vast majority of virtual allocations in the vmalloc region are followed
by a guard page, which can help to avoid overruning on vma into another,
which may map a read-sensitive device.

This patch adds a guard page to the end of the kernel image mapping (i.e.
following the data/bss segments).

Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

92bbd16e

x86/boot: Disable the address-of-packed-member compiler warning · 20c6c189

由 Matthias Kaehlcke 提交于 7月 25, 2017

The clang warning 'address-of-packed-member' is disabled for the general
kernel code, also disable it for the x86 boot code.

This suppresses a bunch of warnings like this when building with clang:

./arch/x86/include/asm/processor.h:535:30: warning: taking address of
  packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
  unaligned pointer value [-Waddress-of-packed-member]
    return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                ^~~~~~~~~~~~~~~~~~~
./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
  'this_cpu_read_stable'
    #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                    ^~~
./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
  'percpu_stable_op'
    : "p" (&(var)));
             ^~~
Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

20c6c189

powerpc/tm: Fix saving of TM SPRs in core dump · cd63f3cf

由 Gustavo Romero 提交于 7月 19, 2017

Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
TFIAR, TEXASR) to the thread struct, unless the process is currently
inside a suspended transaction.

If the process is core dumping, and the TM SPRs have changed since the
last time the process was context switched, then we will save stale
values of the TM SPRs to the core dump.

Fix it by saving the live register state to the thread struct in that
case.

Fixes: 08e1c01d ("powerpc/ptrace: Enable support for TM SPR state")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cd63f3cf

powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries · c9c98bc5

由 Oliver O'Halloran 提交于 7月 28, 2017

The Radix MMU translation tree as defined in ISA v3.0 contains two
different types of entry, directories and leaves. Leaves are
identified by _PAGE_PTE being set.

The formats of the two entries are different, with the directory
entries containing no spare bits for use by software. In particular
the bit we use for _PAGE_DEVMAP is not reserved for software, and is
part of the NLB (Next Level Base) field, essentially the address of
the next level in the tree.

Note that the Linux pte_t is not == _PAGE_PTE. A huge page pmd
entry (or devmap!) is also a leaf and so has _PAGE_PTE set, even
though we use a pmd_t for it in Linux.

The fix is to ensure that the pmd/pte_devmap() confirm they are
looking at a leaf entry (_PAGE_PTE) as well as checking _PAGE_DEVMAP.

Fixes: ebd31197 ("powerpc/mm: Add devmap support for ppc64")
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Tested-by: NLaurent Vivier <lvivier@redhat.com>
Tested-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add a comment in the code and flesh out change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c9c98bc5

arm64: defconfig: enable missing HWSPINLOCK · 3ac8093c

由 Georgi Djakov 提交于 7月 19, 2017

The hardware spinlock drivers now depend on HWSPINLOCK (instead of
selecting it), so we need to explicitly enable it after commit
35fc8a07 ("Make HWSPINLOCK a menuconfig to ease disabling")

Without HWSPINLOCK, various drivers are left with unsatisfied
dependencies and Qcom boards using shared memory based communication
to request regulators are failing to boot and mount rootfs.

Fix this by explicitly enabling HWSPINLOCK in defconfig.
Signed-off-by: NGeorgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: NAndy Gross <andy.gross@linaro.org>

3ac8093c

ARM: pxa: select both FB and FB_W100 for eseries · 1d20d8a9

由 Arnd Bergmann 提交于 3月 19, 2014

We get a link error trying to access the w100fb_gpio_read/write
functions from the platform when the driver is a loadable module
or not built-in, so the platform already uses 'select' to hard-enable
the driver.

However, that fails if the framebuffer subsystem is disabled
altogether.

I've considered various ways to fix this properly, but they
all seem like too much work or too risky, so this simply
adds another 'select' to force the subsystem on as well.

Fixes: 82427de2 ("ARM: pxa: PXA_ESERIES depends on FB_W100.")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

1d20d8a9

ARM: ixp4xx: fix ioport_unmap definition · c4caa8db

由 Arnd Bergmann 提交于 6月 10, 2016

An empty macro definition can cause unexpected behavior, in
case of the ixp4xx ioport_unmap, we get two warnings:

drivers/net/wireless/marvell/libertas/if_cs.c: In function 'if_cs_release':
drivers/net/wireless/marvell/libertas/if_cs.c:826:3: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
ioport_unmap(card->iobase);
drivers/vfio/pci/vfio_pci_rdwr.c: In function 'vfio_pci_vga_rw':
drivers/vfio/pci/vfio_pci_rdwr.c:230:15: error: the omitted middle operand in ?: will always be 'true', suggest explicit middle operand [-Werror=parentheses]
is_ioport ? ioport_unmap(iomem) : iounmap(iomem);

This uses an inline function to define the macro in a safer way.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NKrzysztof Halasa <khalasa@piap.pl>

c4caa8db

ARM: ep93xx: use ARM_PATCH_PHYS_VIRT correctly · cd5bad41

由 Arnd Bergmann 提交于 3月 26, 2014

Just like ARCH_MULTIPLATFORM, we want to use ARM_PATCH_PHYS_VIRT
when possible, but that fails for NOMMU or XIP_KERNEL configurations.
Using 'imply' instead of 'select' gets this right and only uses
the symbol when we don't have to hardcode the address anyway.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NAlexander Sverdlin <alexander.sverdlin@gmail.com>

cd5bad41

ARM: mmp: mark usb_dma_mask as __maybe_unused · 1c1953f3

由 Arnd Bergmann 提交于 1月 15, 2016

This variable may be used by some devices that each have their
on Kconfig symbol, or by none of them, and that causes a build
warning:

arch/arm/mach-mmp/devices.c:241:12: error: 'usb_dma_mask' defined but not used [-Werror=unused-variable]

Marking it __maybe_unused avoids the warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

1c1953f3

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功