1. 11 6月, 2017 2 次提交
    • W
      KVM: async_pf: avoid async pf injection when in guest mode · 9bc1f09f
      Wanpeng Li 提交于
       INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
             Not tainted 4.12.0-rc4+ #8
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       gnome-terminal- D    0  1734   1015 0x00000000
       Call Trace:
        __schedule+0x3cd/0xb30
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? __vfs_read+0x37/0x150
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
      
      This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
      and then gives stress to memory on L1, I can observed this hang on L1 when
      at least ~70% swap area is occupied on L0.
      
      This is due to async pf was injected to L2 which should be injected to L1,
      L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
      actually), and L1 guest starts accumulating tasks stuck in D state in
      kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
      
      This patch fixes the hang by doing async pf when executing L1 guest.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9bc1f09f
    • G
      hexagon: Use raw_copy_to_user · 4d801cca
      Guenter Roeck 提交于
      Commit ac4691fa ("hexagon: switch to RAW_COPY_USER") replaced
      __copy_to_user_hexagon() with raw_copy_to_user(), but did not catch
      all callers, resulting in the following build error.
      
      arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon':
      arch/hexagon/mm/uaccess.c:40:3: error:
      	implicit declaration of function '__copy_to_user_hexagon'
      
      Fixes: ac4691fa ("hexagon: switch to RAW_COPY_USER")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NRichard Kuo <rkuo@codeaurora.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      4d801cca
  2. 09 6月, 2017 1 次提交
  3. 08 6月, 2017 4 次提交
    • W
      KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation · a3641631
      Wanpeng Li 提交于
      If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
      potentially can be exploited the vulnerability. this will out-of-bounds
      read and write.  Luckily, the effect is small:
      
      	/* when no next entry is found, the current entry[i] is reselected */
      	for (j = i + 1; ; j = (j + 1) % nent) {
      		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
      		if (ej->function == e->function) {
      
      It reads ej->maxphyaddr, which is user controlled.  However...
      
      			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
      
      After cpuid_entries there is
      
      	int maxphyaddr;
      	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */
      
      So we have:
      
      - cpuid_entries at offset 1B50 (6992)
      - maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
      - padding at 27D4...27DF
      - emulate_ctxt at 27E0
      
      And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
      would have been much worse.
      
      This patch fixes it by modding the index to avoid the out-of-bounds
      access. Worst case, i == j and ej->function == e->function,
      the loop can bail out.
      Reported-by: NMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Guofang Mo <moguofang@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3641631
    • M
      powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by default · c6ee9619
      Michael Ellerman 提交于
      The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
      other general kernel options. It's really more at home in the "Platform
      Support" section, so move it there.
      
      Also enable it by default, for Book3s 64. It does mostly nothing unless
      the device tree properties are found, and we will want it enabled
      eventually in distro kernels, so turn it on to start getting more
      testing.
      
      Fixes: 5a61ef74 ("powerpc/64s: Support new device tree binding for discovering CPU features")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c6ee9619
    • A
      powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space · 92d9dfda
      Aneesh Kumar K.V 提交于
      Supporting 512TB requires us to do a order 3 allocation for level 1 page
      table (pgd). This results in page allocation failures with certain workloads.
      For now limit 4k linux page size config to 64TB.
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Reported-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92d9dfda
    • D
      x86/microcode/intel: Clear patch pointer before jettisoning the initrd · 5b0bc9ac
      Dominik Brodowski 提交于
      During early boot, load_ucode_intel_ap() uses __load_ucode_intel()
      to obtain a pointer to the relevant microcode patch (embedded in the
      initrd), and stores this value in 'intel_ucode_patch' to speed up the
      microcode patch application for subsequent CPUs.
      
      On resuming from suspend-to-RAM, however, load_ucode_ap() calls
      load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is
      long gone so the pointer stored in 'intel_ucode_patch' no longer points to
      a valid microcode patch.
      
      Clear that pointer so that we effectively fall back to the CPU hotplug
      notifier callbacks to update the microcode.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      [ Edit and massage commit message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.10..
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5b0bc9ac
  4. 07 6月, 2017 13 次提交
  5. 06 6月, 2017 6 次提交
    • W
      KVM: nVMX: Fix exception injection · d4912215
      Wanpeng Li 提交于
       WARNING: CPU: 3 PID: 2840 at arch/x86/kvm/vmx.c:10966 nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       CPU: 3 PID: 2840 Comm: qemu-system-x86 Tainted: G           OE   4.12.0-rc3+ #23
       RIP: 0010:nested_vmx_vmexit+0xdcd/0xde0 [kvm_intel]
       Call Trace:
        ? kvm_check_async_pf_completion+0xef/0x120 [kvm]
        ? rcu_read_lock_sched_held+0x79/0x80
        vmx_queue_exception+0x104/0x160 [kvm_intel]
        ? vmx_queue_exception+0x104/0x160 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x1171/0x1ce0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x240 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x240 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This is triggered occasionally by running both win7 and win2016 in L2, in
      addition, EPT is disabled on both L1 and L2. It can't be reproduced easily.
      
      Commit 0b6ac343 (KVM: nVMX: Correct handling of exception injection) mentioned
      that "KVM wants to inject page-faults which it got to the guest. This function
      assumes it is called with the exit reason in vmcs02 being a #PF exception".
      Commit e011c663 (KVM: nVMX: Check all exceptions for intercept during delivery to
      L2) allows to check all exceptions for intercept during delivery to L2. However,
      there is no guarantee the exit reason is exception currently, when there is an
      external interrupt occurred on host, maybe a time interrupt for host which should
      not be injected to guest, and somewhere queues an exception, then the function
      nested_vmx_check_exception() will be called and the vmexit emulation codes will
      try to emulate the "Acknowledge interrupt on exit" behavior, the warning is
      triggered.
      
      Reusing the exit reason from the L2->L0 vmexit is wrong in this case,
      the reason must always be EXCEPTION_NMI when injecting an exception into
      L1 as a nested vmexit.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Fixes: e011c663 ("KVM: nVMX: Check all exceptions for intercept during delivery to L2")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d4912215
    • P
      kvm: async_pf: fix rcu_irq_enter() with irqs enabled · bbaf0e2b
      Paolo Bonzini 提交于
      native_safe_halt enables interrupts, and you just shouldn't
      call rcu_irq_enter() with interrupts enabled.  Reorder the
      call with the following local_irq_disable() to respect the
      invariant.
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bbaf0e2b
    • M
      powerpc/perf: Fix Power9 test_adder fields · 8c218578
      Madhavan Srinivasan 提交于
      Commit 8d911904 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
      was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
      the use of PMC5 using raw event code. But instead of updating the
      power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
      power9_pmu structure. Fix it.
      
      Fixes: 8d911904 ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
      Reported-by: NShriya <shriyak@linux.vnet.ibm.com>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Tested-by: NShriya <shriyak@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8c218578
    • M
      powerpc/numa: Fix percpu allocations to be NUMA aware · ba4a648f
      Michael Ellerman 提交于
      In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
      switched to the generic implementation of cpu_to_node(), which uses a percpu
      variable to hold the NUMA node for each CPU.
      
      Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
      of our percpu areas, leading to a chicken and egg problem. In practice what
      happens is when we are setting up the percpu areas, cpu_to_node() reports that
      all CPUs are on node 0, so we allocate all percpu areas on node 0.
      
      This is visible in the dmesg output, as all pcpu allocs being in group 0:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
        pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
        pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
      
      To fix it we need an early_cpu_to_node() which can run prior to percpu being
      setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
      in. With the patch dmesg output shows two groups, 0 and 1:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
        pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
        pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
      
      We can also check the data_offset in the paca of various CPUs, with the fix we
      see:
      
        CPU 0:  data_offset = 0x0ffe8b0000
        CPU 24: data_offset = 0x1ffe5b0000
      
      And we can see from dmesg that CPU 24 has an allocation on node 1:
      
        node   0: [mem 0x0000000000000000-0x0000000fffffffff]
        node   1: [mem 0x0000001000000000-0x0000001fffffffff]
      
      Cc: stable@vger.kernel.org # v3.16+
      Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ba4a648f
    • B
      powerpc/kernel: Initialize load_tm on task creation · 7f22ced4
      Breno Leitao 提交于
      Currently tsk->thread.load_tm is not initialized in the task creation
      and can contain garbage on a new task.
      
      This is an undesired behaviour, since it affects the timing to enable
      and disable the transactional memory laziness (disabling and enabling
      the MSR TM bit, which affects TM reclaim and recheckpoint in the
      scheduling process).
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7f22ced4
    • D
      1b4af13f
  6. 05 6月, 2017 4 次提交
    • A
      ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M · 06a4b6d0
      Ard Biesheuvel 提交于
      As reported by Patrice, the header layout of the decompressor is
      incorrect when building for v7-M. In this case, the __nop macro
      resolves to 'mov r0, r0', which is emitted as a narrow encoding,
      resulting in the header data fields to end up at lower offsets than
      required.
      
      Given the variety of targets we need to support with the same code,
      the startup sequence is a bit of a jumble, and uses instructions
      and macros whose encoding widths cannot be specified (badr), or only
      exist in a narrow encoding (bx)
      
      So force the use of a wide encoding in __nop, and replace the start
      sequence with a simple jump to the label marking the start of code,
      preceded by a Thumb2 mode switch if required (using explicit wide
      encodings where appropriate). The label itself can be moved to the
      start of code [where it belongs] due to the larger range of branch
      instructions as compared to adr instructions.
      Reported-by: NPatrice CHOTARD <patrice.chotard@st.com>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      06a4b6d0
    • V
      ARM: 8676/1: NOMMU: provide pgprot_device() macro · 7ef4783e
      Vladimir Murzin 提交于
      NOMMU build leads to the following error:
      
        CC      drivers/pci/mmap.o
      drivers/pci/mmap.c: In function 'pci_mmap_resource_range':
      drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration]
         vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
         ^
      
      cc1: some warnings being treated as errors
      scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed
      make[2]: *** [drivers/pci/mmap.o] Error 1
      scripts/Makefile.build:561: recipe for target 'drivers/pci' failed
      make[1]: *** [drivers/pci] Error 2
      Makefile:1016: recipe for target 'drivers' failed
      make: *** [drivers] Error 2
      
      Fix it with support of pgprot_device() macro for NOMMU.
      
      Fixes: 00d2904f ("ARM/PCI: Use generic pci_mmap_resource_range()")
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      7ef4783e
    • C
      x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC · ae1d557d
      Christian Sünkenberg 提交于
      A SoC variant of Geode GX1, notably NSC branded SC1100, seems to
      report an inverted Device ID in its DIR0 configuration register,
      specifically 0xb instead of the expected 0x4.
      
      Catch this presumably quirky version so it's properly recognized
      as GX1 and has its cache switched to write-back mode, which provides
      a significant performance boost in most workloads.
      
      SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip",
      states in section 1.1.7.1 "Device ID" that device identification
      values are specified in SC1100's device errata. These, however,
      seem to not have been publicly released.
      
      Wading through a number of boot logs and /proc/cpuinfo dumps found on
      pastebin and blogs, this patch should mostly be relevant for a number
      of now admittedly aging Soekris NET4801 and PC Engines WRAP devices,
      the latter being the platform this issue was discovered on.
      Performance impact was verified using "openssl speed", with
      write-back caching scaling throughput between -3% and +41%.
      Signed-off-by: NChristian Sünkenberg <christian.suenkenberg@student.kit.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ae1d557d
    • B
      powerpc/kernel: Fix FP and vector register restoration · 1195892c
      Breno Leitao 提交于
      Currently tsk->thread->load_vec and load_fp are not initialized during
      task creation, which can lead to garbage values in these variables (non-zero
      values).
      
      These variables will be checked later in restore_math() to validate if the
      FP and vector registers are being utilized. Since these values might be
      non-zero, the restore_math() will continue to save the FP and vectors even if
      they were never utilized by the userspace application. load_fp and load_vec
      counters will then overflow (they wrap at 255) and the FP and Altivec will be
      finally disabled, but before that condition is reached (counter overflow)
      several context switches will have restored FP and vector registers without
      need, causing a performance degradation.
      
      Fixes: 70fe3d98 ("powerpc: Restore FPU/VEC/VSX if previously used")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NGustavo Romero <gusbromero@gmail.com>
      Acked-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1195892c
  7. 03 6月, 2017 1 次提交
  8. 02 6月, 2017 4 次提交
  9. 01 6月, 2017 5 次提交