提交 · a9f8094aaedc7bc21e9232ea5120eaf2e0c824fd · openanolis / cloud-kernel

27 2月, 2016 1 次提交

Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()" · 6c777e87

由 Bjorn Helgaas 提交于 2月 17, 2016

991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug.

Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not
v4.3 and bisected it to 991de2e5.  Sunjin reported that the RocketRAID
272x driver worked in v4.2 but not v4.3.  In both cases booting with
"pci=routirq" is a workaround.

I think the problem is that after 991de2e5, we no longer call
pcibios_enable_irq() for upstream bridges.  Prior to 991de2e5, when a
driver called pci_enable_device(), we recursively called
pcibios_enable_irq() for upstream bridges via pci_enable_bridge().

After 991de2e5, we call pcibios_enable_irq() from pci_device_probe()
instead of the pci_enable_device() path, which does *not* call
pcibios_enable_irq() for upstream bridges.

Revert 991de2e5 to fix these driver regressions.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Reported-and-tested-by: NОлег Мороз <oleg.moroz@mcc.vniiem.ru>
Reported-by: NSunjin Yang <fan4326@gmail.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NRafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>

6c777e87

25 2月, 2016 2 次提交

KVM: x86: MMU: fix ubsan index-out-of-range warning · 17e4bce0

由 Mike Krinkin 提交于 2月 24, 2016

Ubsan reports the following warning due to a typo in
update_accessed_dirty_bits template, the patch fixes
the typo:

[  168.791851] ================================================================================
[  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
[  168.791866] index 4 is out of range for type 'u64 [4]'
[  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
[  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
[  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
[  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
[  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
[  168.791891] Call Trace:
[  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
[  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
[  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
[  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
[  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
[  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
[  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
[  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
[  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
[  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
[  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
[  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
[  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
[  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
[  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
[  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
[  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
[  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
[  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
[  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
[  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
[  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
[  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
[  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
[  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
[  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
[  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
[  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
[  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
[  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
[  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
[  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
[  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
[  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
[  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
[  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
[  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
[  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
[  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
[  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
[  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
[  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
[  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
[  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
[  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
[  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
[  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
[  168.792780] ================================================================================
Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17e4bce0

arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2 · fd451b90

由 Marc Zyngier 提交于 2月 17, 2016

The GICv3 architecture spec says:

Writing to the active priority registers in any order other than
the following order will result in UNPREDICTABLE behavior:
- ICH_AP0R<n>_EL2.
- ICH_AP1R<n>_EL2.

So let's not pointlessly go against the rule...
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

fd451b90

24 2月, 2016 10 次提交

KVM: x86: fix conversion of addresses to linear in 32-bit protected mode · 0c1d77f4

由 Paolo Bonzini 提交于 2月 19, 2016

Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c",
2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
case actually could modify "la", but after the patch this change is
not reflected in *linear.

The bug is visible whenever a non-zero segment base causes the linear
address to wrap around the 4GB mark.

Fixes: e8dd2d2d
Cc: stable@vger.kernel.org
Reported-by: NAurelien Jarno <aurelien@aurel32.net>
Tested-by: NAurelien Jarno <aurelien@aurel32.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0c1d77f4

KVM: x86: fix missed hardware breakpoints · 172b2386

由 Paolo Bonzini 提交于 2月 10, 2016

Sometimes when setting a breakpoint a process doesn't stop on it.
This is because the debug registers are not loaded correctly on
VCPU load.

The following simple reproducer from Oleg Nesterov tries using debug
registers in two threads.  To see the bug, run a 2-VCPU guest with
"taskset -c 0" and run "./bp 0 1" inside the guest.

    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <sys/user.h>
    #include <asm/debugreg.h>
    #include <assert.h>

    #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

    unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
    {
        unsigned long dr7;

        dr7 = ((len | type) & 0xf)
            << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
        if (enable)
            dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));

        return dr7;
    }

    int write_dr(int pid, int dr, unsigned long val)
    {
        return ptrace(PTRACE_POKEUSER, pid,
                offsetof (struct user, u_debugreg[dr]),
                val);
    }

    void set_bp(pid_t pid, void *addr)
    {
        unsigned long dr7;
        assert(write_dr(pid, 0, (long)addr) == 0);
        dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
        assert(write_dr(pid, 7, dr7) == 0);
    }

    void *get_rip(int pid)
    {
        return (void*)ptrace(PTRACE_PEEKUSER, pid,
                offsetof(struct user, regs.rip), 0);
    }

    void test(int nr)
    {
        void *bp_addr = &&label + nr, *bp_hit;
        int pid;

        printf("test bp %d\n", nr);
        assert(nr < 16); // see 16 asm nops below

        pid = fork();
        if (!pid) {
            assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
            kill(getpid(), SIGSTOP);
            for (;;) {
                label: asm (
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                    "nop; nop; nop; nop;"
                );
            }
        }

        assert(pid == wait(NULL));
        set_bp(pid, bp_addr);

        for (;;) {
            assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
            assert(pid == wait(NULL));

            bp_hit = get_rip(pid);
            if (bp_hit != bp_addr)
                fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                    bp_hit - &&label, nr);
        }
    }

    int main(int argc, const char *argv[])
    {
        while (--argc) {
            int nr = atoi(*++argv);
            if (!fork())
                test(nr);
        }

        while (wait(NULL) > 0)
            ;
        return 0;
    }

Cc: stable@vger.kernel.org
Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
Reported-by: NAndrey Wagin <avagin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

172b2386

arm/arm64: KVM: Feed initialized memory to MMIO accesses · 1d6a8212

由 Marc Zyngier 提交于 2月 15, 2016

On an MMIO access, we always copy the on-stack buffer info
the shared "run" structure, even if this is a read access.
This ends up leaking up to 8 bytes of uninitialized memory
into userspace, depending on the size of the access.

An obvious fix for this one is to only perform the copy if
this is an actual write.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

1d6a8212

arc: SMP: CONFIG_ARC_IPI_DBG cleanup · 9ef2d8be

由 Valentin Rothberg 提交于 2月 24, 2016

Previous Commit ("ARC: SMP: No need for CONFIG_ARC_IPI_DBG") removed
the Kconfig option ARC_IPI_DBG.  Remove the last reference on this
option.
Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

9ef2d8be

ARC: SMP: No need for CONFIG_ARC_IPI_DBG · d73b73f5

由 Vineet Gupta 提交于 2月 19, 2016

This was more relevant during SMP bringup.

The warning for bogus msg better be visible always.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

d73b73f5

ARCv2: Elide sending new cross core intr if receiver didn't ack prev · 3dea30ca

由 Vineet Gupta 提交于 2月 19, 2016

ARConnect/MCIP IPI sending has a retry-wait loop in case caller had
not seen a previous such interrupt. Turns out that it is not needed at
all. Linux cross core calling allows coalescing multiple IPIs to same
receiver - it is fine as long as there is one.

This logic is built into upper layer already, at a higher level of
abstraction. ipi_send_msg_one() sets the actual msg payload, but it only
calls MCIP IPI sending if msg holder was empty (using
atomic-set-new-and-get-old construct). Thus it is unlikely that the
retry-wait looping was ever getting exercised at all.

Cc: Chuck Jordan <cjordan@synopsys.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

3dea30ca

V
ARCv2: SMP: Push IPI_IRQ into IPI provider · 96817879
由 Vineet Gupta 提交于 2月 23, 2016
```
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
96817879

ARC: [intc-compact] Remove IPI setup from ARCompact port · dbcbc7e7

由 Vineet Gupta 提交于 1月 28, 2016

There is no real ARC700 based SMP SoC so remove IPI definition.
EZChip's SMP ARC700 is going to use a different intc and IPI provider
anyways.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

dbcbc7e7

ARCv2: SMP: Emulate IPI to self using software triggered interrupt · bb143f81

由 Vineet Gupta 提交于 2月 23, 2016

ARConnect/MCIP Inter-Core-Interrupt module can't send interrupt to
local core. So use core intc capability to trigger software
interrupt to self, using an unsued IRQ #21.

This showed up as csd deadlock with LTP trace_sched on a dual core
system. This test acts as scheduler fuzzer, triggering all sorts of
schedulting activity. Trouble starts with IPI to self, which doesn't get
delivered (effectively lost due to H/w capability), but the msg intended
to be sent remain enqueued in per-cpu @ipi_data.

All subsequent IPIs to this core from other cores get elided due to the
IPI coalescing optimization in ipi_send_msg_one() where a pending msg
implies an IPI already sent and assumes other core is yet to ack it.
After the elided IPI, other core simply goes into csd_lock_wait()
but never comes out as this core never sees the interrupt.

Fixes STAR 9001008624

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>        [4.2]
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

bb143f81

x86: fix SMAP in 32-bit environments · de9e478b

由 Linus Torvalds 提交于 2月 23, 2016

In commit 11f1a4b9 ("x86: reorganize SMAP handling in user space
accesses") I changed how the stac/clac instructions were generated
around the user space accesses, which then made it possible to do
batched accesses efficiently for user string copies etc.

However, in doing so, I completely spaced out, and didn't even think
about the 32-bit case. And nobody really even seemed to notice, because
SMAP doesn't even exist until modern Skylake processors, and you'd have
to be crazy to run 32-bit kernels on a modern CPU.

Which brings us to Andy Lutomirski.

He actually tested the 32-bit kernel on new hardware, and noticed that
it doesn't work. My bad. The trivial fix is to add the required
uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.

I feel a bit bad about this patch, just because that header file really
should be cleaned up to avoid all the duplicated code in it, and this
commit just expands on the problem. But this just fixes the bug without
any bigger cleanup surgery.
Reported-and-tested-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de9e478b

23 2月, 2016 1 次提交

arc: get rid of DEVTMPFS dependency on INITRAMFS_SOURCE · 3e5177c1

由 Alexey Brodkin 提交于 2月 20, 2016

Even though DEVTMPFS is required when our pre-built initramfs
is used it is not the case in general. It is perfectly possible
to use initramfs with device nodes already populated or there
could be other usages, see discussion below for more detials:
http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/37819/focus=37821

This change removes mentioned dependency from arch/arc/Kconfig
updating instead those defconfigs that are usually used with this
kind of pre-build initramfs.

And while at it all touched defconfigs were regenerated via
savedefconfig and some options were removed:
 * USB is selected by other options implicitly
 * VGA_CONSOLE is disableb for ARC since
   031e29b5
 * EXT3_FS automatically selects EXT4_FS
 * MTDxxx and JFFS2_FS make no sense for AXS because
   AXS NAND controller is not upstreamed
 * NET_OSCI_LAN is not in upstream as well
 * ARCPGU_xxx options make no sense because ARC PGU is not yet
   in upstream and when it gets there all config options would
   be taken from devicetree
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

3e5177c1

22 2月, 2016 4 次提交

s390/fpu: signals vs. floating point control register · 1b17cb79

由 Martin Schwidefsky 提交于 2月 19, 2016

git commit 904818e2
"s390/kernel: introduce fpu-internal.h with fpu helper functions"
introduced the fpregs_store / fp_regs_load helper. These function
fail to save and restore the floating pointer control registers.

The effect is that the FPC is not correctly handled on signal
delivery and signal return.

Cc: stable@vger.kernel.org # 4.4
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1b17cb79

s390/compat: correct restore of high gprs on signal return · 342300cc

由 Martin Schwidefsky 提交于 2月 19, 2016

git commit 80703617
"s390: add support for vector extension"
broke 31-bit compat processes in regard to signal handling.

The restore_sigregs_ext32() function is used to restore the additional
elements from the user space signal frame. Among the additional elements
are the upper registers halves for 64-bit register support for 31-bit
processes. The copy_from_user that is used to retrieve the high-gprs
array from the user stack uses an incorrect length, 8 bytes instead of
64 bytes. This causes incorrect upper register halves to get loaded.

Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

342300cc

powerpc/mm/hash: Clear the invalid slot information correctly · 9ab3ac23

由 Aneesh Kumar K.V 提交于 2月 20, 2016

We can get a hash pte fault with 4k base page size and find the pte
already inserted with 64K base page size. In that case we need to clear
the existing slot information from the old pte. Fix this correctly

With THP, we also clear the slot information with respect to all
the 64K hash pte mapping that 16MB page. They are all invalid
now. This make sure we don't find the slot valid when we fault with
4k base page size. Finding the slot valid should not result in any wrong
behavior because we do check again in hash page table for the validity.
But we can avoid that check completely.

Fixes: a43c0eb8 ("powerpc/mm: Convert 4k hash insert to C")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9ab3ac23

powerpc/eeh: Fix partial hotplug criterion · f6bf0fa1

由 Gavin Shan 提交于 2月 12, 2016

During error recovery, the device could be removed as part of the
partial hotplug. The criterion used to come with partial hotplug
is: if the device driver provides error_detected(), slot_reset()
and resume() callbacks, it's immune from hotplug. Otherwise,
it's going to experience partial hotplug during EEH recovery. But
the criterion isn't correct enough: mlx4_core driver for Mellanox
adapters provides error_detected(), slot_reset() callbacks, but
resume() isn't there. Those Mellanox adapters won't be to involved
in the partial hotplug.

This fixes the criterion to a practical one: adpater with driver
that provides error_detected(), slot_reset() will be immune from
partial hotplug. resume() isn't mandatory.

Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion")
Cc: stable@vger.kernel.org #v4.4+
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f6bf0fa1

19 2月, 2016 3 次提交

arm64: mm: allow the kernel to handle alignment faults on user accesses · 52d7523d

由 EunTaik Lee 提交于 2月 16, 2016

Although we don't expect to take alignment faults on access to normal
memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into
system calls, leading to things like get_user accessing device memory.

Rather than OOPS the kernel, allow any exception fixups to run and
return something like -EFAULT back to userspace. This makes the
behaviour more consistent with userspace, even though applications with
access to device mappings can easily cause other issues if they try
hard enough.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NEun Taik Lee <eun.taik.lee@samsung.com>
[will: dropped __kprobes annotation and rewrote commit mesage]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

52d7523d

arm64: kbuild: make "make install" not depend on vmlinux · 8684fa3e

由 Masahiro Yamada 提交于 2月 19, 2016

For the same reason as commit 19514fc6 ("arm, kbuild: make "make
install" not depend on vmlinux"), the install targets should never
trigger the rebuild of the kernel.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

8684fa3e

mm, x86: fix pte_page() crash in gup_pte_range() · 457a98b0

由 Hugh Dickins 提交于 2月 17, 2016

Commit 3565fce3 ("mm, x86: get_user_pages() for dax mappings") has
moved up the pte_page(pte) in x86's fast gup_pte_range(), for no
discernible reason: put it back where it belongs, after the pte_flags
check and the pfn_valid cross-check.

That may be the cause of the NULL pointer dereference in
gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a
qemu-kvm based VM.
Signed-off-by: NHugh Dickins <hughd@google.com>
Reported-by: NMichael Long <Harn-Solo@gmx.de>
Tested-by: NMichael Long <Harn-Solo@gmx.de>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

457a98b0

18 2月, 2016 6 次提交

ARCv2: boot report CCMs (Closely Coupled Memories) · a150b085

由 Vineet Gupta 提交于 2月 16, 2016

- ARCv2 uses a seperate BCR for {I,D}CCM base address:
  ARCompact encoded both base/size in same BCR

- Size encoding in common BCR is different for ARCompact/ARCv2
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

a150b085

V
ARCv2: boot print Low Latency Memory · 98341f7d
由 Vineet Gupta 提交于 2月 15, 2016
```
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
98341f7d

ARC: Assume multiplier is always present · 0eca6fdb

由 Vineet Gupta 提交于 2月 16, 2016

It is unlikely that designs running Linux will not have multiplier.
Further the current support is not complete as tool don't generate a
multilib w/o multiplier.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

0eca6fdb

x86/mm: Fix vmalloc_fault() to handle large pages properly · f4eafd8b

由 Toshi Kani 提交于 2月 17, 2016

A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

     BUG: unable to handle kernel paging request at ffff880840000ff8
     IP: vmalloc_fault+0x1be/0x300
     PGD c7f03a067 PUD 0
     Oops: 0000 [#1] SM
     Call Trace:
        __do_page_fault+0x285/0x3e0
        do_page_fault+0x2f/0x80
        ? put_prev_entity+0x35/0x7a0
        page_fault+0x28/0x30
        ? memcpy_erms+0x6/0x10
        ? schedule+0x35/0x80
        ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
        ? schedule_timeout+0x183/0x240
        btt_log_read+0x63/0x140 [nd_btt]
         :
        ? __symbol_put+0x60/0x60
        ? kernel_read+0x50/0x80
        SyS_finit_module+0xb9/0xf0
        entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes.  pgd_ctor() sets the kernel's PGD entries to
user's during fork().  When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it.  If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

 - No change for the PGD sync operation as it handles large
   pages already.
 - Add pud_huge() and pmd_huge() to the validation code to
   handle large pages.
 - Change pud_page_vaddr() to pud_pfn() since an ioremap range
   is not directly mapped (while the if-statement still works
   with a bogus addr).
 - Change pmd_page() to pmd_pfn() since an ioremap range is not
   backed by struct page (while the if-statement still works
   with a bogus addr).

32-bit:
 - No change for the sync operation since the index3 PGD entry
   covers the entire vmalloc range, which is always valid.
   (A separate change to sync PGD entry is necessary if this
    memory layout is changed regardless of the page size.)
 - Add pmd_huge() to the validation code to handle large pages.
   This is for completeness since vmalloc_fault() won't happen
   in ioremap'd ranges as its PGD entry is always valid.
Reported-by: NHenning Schild <henning.schild@siemens.com>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NBorislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f4eafd8b

Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" · 67b4eab9

由 Bjorn Helgaas 提交于 2月 17, 2016

Revert 811a4e6f ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").

This is part of reverting 991de2e5 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NRafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>

67b4eab9

Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled" · fe25d078

由 Bjorn Helgaas 提交于 2月 17, 2016

Revert 8affb487 ("x86/PCI: Don't alloc pcibios-irq when MSI is
enabled").

This is part of reverting 991de2e5 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NRafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
CC: Joerg Roedel <jroedel@suse.de>

fe25d078

17 2月, 2016 7 次提交

powerpc/ioda: Set "read" permission when "write" is set · 6ecad912

由 Alexey Kardashevskiy 提交于 2月 17, 2016

Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty of
platforms. However IODA2 is strict about this and produces an EEH when
"read" permission is not set and reading happens.

This adds a workaround in the IODA code to always add the "read" bit
when the "write" bit is set.

Fixes: 10b35b2b ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE")
Cc: stable@vger.kernel.org # 4.2+
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6ecad912

arm64: dma-mapping: fix handling of devices registered before arch_initcall · 722ec35f

由 Marek Szyprowski 提交于 2月 16, 2016

This patch ensures that devices, which got registered before arch_initcall
will be handled correctly by IOMMU-based DMA-mapping code.

Cc: <stable@vger.kernel.org>
Fixes: 13b8629f ("arm64: Add IOMMU dma_ops")
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

722ec35f

hpet: Drop stale URLs · 4e7f9df2

由 Michael S. Tsirkin 提交于 2月 11, 2016

Looks like the HPET spec at intel.com got moved.
It isn't hard to find so drop the link, just mention
the revision assumed.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

4e7f9df2

x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() · a82eee74

由 Toshi Kani 提交于 2月 11, 2016

Data corruption issues were observed in tests which initiated
a system crash/reset while accessing BTT devices.  This problem
is reproducible.

The BTT driver calls pmem_rw_bytes() to update data in pmem
devices.  This interface calls __copy_user_nocache(), which
uses non-temporal stores so that the stores to pmem are
persistent.

__copy_user_nocache() uses non-temporal stores when a request
size is 8 bytes or larger (and is aligned by 8 bytes).  The
BTT driver updates the BTT map table, which entry size is
4 bytes.  Therefore, updates to the map table entries remain
cached, and are not written to pmem after a crash.

Change __copy_user_nocache() to use non-temporal store when
a request size is 4 bytes.  The change extends the current
byte-copy path for a less-than-8-bytes request, and does not
add any overhead to the regular path.
Reported-and-tested-by: NMicah Parrish <micah.parrish@hpe.com>
Reported-and-tested-by: NBrian Boylston <brian.boylston@hpe.com>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
[ Small readability edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a82eee74

x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable · ee9737c9

由 Toshi Kani 提交于 2月 11, 2016

Add comments to __copy_user_nocache() to clarify its procedures
and alignment requirements.

Also change numeric branch target labels to named local labels.

No code changed:

 arch/x86/lib/copy_user_64.o:

    text    data     bss     dec     hex filename
    1239       0       0    1239     4d7 copy_user_64.o.before
    1239       0       0    1239     4d7 copy_user_64.o.after

 md5:
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brian.boylston@hpe.com
Cc: dan.j.williams@intel.com
Cc: linux-nvdimm@lists.01.org
Cc: micah.parrish@hpe.com
Cc: ross.zwisler@linux.intel.com
Cc: vishal.l.verma@intel.com
Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
[ Small readability edits and added object file comparison. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ee9737c9

s390/maccess: reduce stnsm instructions · 52499d93

由 Heiko Carstens 提交于 2月 12, 2016

When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
on kdump") both Christian and I missed that we can save an additional
stnsm instruction.

This saves us a couple of cycles which could improve the speed of
memcpy_real.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

52499d93

perf/x86/amd/uncore: Plug reference leak · 8bc9162c

由 Thomas Gleixner 提交于 2月 16, 2016

In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore
struct is freed, but the percpu pointer still references it. Set it to NULL.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>

8bc9162c

16 2月, 2016 1 次提交

arm64/efi: Make strnlen() available to the EFI namespace · 7f4e3462

由 Thierry Reding 提交于 2月 16, 2016

Changes introduced in the upstream version of libfdt pulled in by commit
91feabc2 ("scripts/dtc: Update to upstream commit b06e55c88b9b") use
the strnlen() function, which isn't currently available to the EFI name-
space. Add it to the EFI namespace to avoid a linker error.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Rob Herring <robh@kernel.org>
Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

7f4e3462

15 2月, 2016 5 次提交

arm/arm64: crypto: assure that ECB modes don't require an IV · bee038a4

由 Jeremy Linton 提交于 2月 12, 2016

ECB modes don't use an initialization vector. The kernel
/proc/crypto interface doesn't reflect this properly.
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

bee038a4

xen/pcifront: Report the errors better. · 2cfec6a2

由 Konrad Rzeszutek Wilk 提交于 2月 11, 2016

The messages should be different depending on the type of error.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

2cfec6a2

powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8

由 Aneesh Kumar K.V 提交于 2月 09, 2016

With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.

Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.

Consider following race:

		CPU0				CPU1
shrink_page_list()
  add_to_swap()
    split_huge_page_to_list()
      __split_huge_pmd_locked()
        pmdp_huge_clear_flush_notify()
	// pmd_none() == true
					exit_mmap()
					  unmap_vmas()
					    zap_pmd_range()
					      // no action on pmd since pmd_none() == true
	pmd_populate()

As result the THP will not be freed. The leak is detected by check_mm():

	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512

The above required us to not mark pmd none during a pmd split.

The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)

Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.

Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c777e2a8

powerpc/powernv: Fix stale PE primary bus · 1bc74f1c

由 Gavin Shan 提交于 2月 09, 2016

When PCI bus is unplugged during full hotplug for EEH recovery,
the platform PE instance (struct pnv_ioda_pe) isn't released and
it dereferences the stale PCI bus that has been released. It leads
to kernel crash when referring to the stale PCI bus.

This fixes the issue by correcting the PE's primary bus when it's
oneline at plugging time, in pnv_pci_dma_bus_setup() which is to
be called by pcibios_fixup_bus().

Cc: stable@vger.kernel.org # v4.1+
Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Reported-by: NPradipta Ghosh <pradghos@in.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1bc74f1c

powerpc/eeh: Fix stale cached primary bus · 05ba75f8

由 Gavin Shan 提交于 2月 09, 2016

When PE is created, its primary bus is cached to pe->bus. At later
point, the cached primary bus is returned from eeh_pe_bus_get().
However, we could get stale cached primary bus and run into kernel
crash in one case: full hotplug as part of fenced PHB error recovery
releases all PCI busses under the PHB at unplugging time and recreate
them at plugging time. pe->bus is still dereferencing the PCI bus
that was released.

This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity
of pe->bus. pe->bus is updated when its first child EEH device is
online and the flag is set. Before unplugging in full hotplug for
error recovery, the flag is cleared.

Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE")
Cc: stable@vger.kernel.org #v3.11+
Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Reported-by: NPradipta Ghosh <pradghos@in.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05ba75f8

openanolis / cloud-kernel 12 个月 前同步成功

openanolis / cloud-kernel
12 个月前同步成功