1. 05 8月, 2017 2 次提交
  2. 04 8月, 2017 2 次提交
  3. 02 8月, 2017 1 次提交
  4. 31 7月, 2017 4 次提交
    • B
      parisc: Define CONFIG_CPU_BIG_ENDIAN · 74ad3d28
      Babu Moger 提交于
      While working on enabling queued rwlock on SPARC, found this following
      code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
      to clear a byte.
      
      static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
       {
      	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
       }
      
      Problem is many of the fixed big endian architectures don't define
      CPU_BIG_ENDIAN and clears the wrong byte.
      
      Define CPU_BIG_ENDIAN for parisc architecture to fix it.
      Signed-off-by: NBabu Moger <babu.moger@oracle.com>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      74ad3d28
    • N
      powerpc/64s: Fix stack setup in watchdog soft_nmi_common() · cc491f1d
      Nicholas Piggin 提交于
      The watchdog soft-NMI exception stack setup loads a stack pointer
      twice, which is an obvious error. It ends up using the system reset
      interrupt (true-NMI) stack, which is also a bug because the watchdog
      could be preempted by a system reset interrupt that overwrites the
      NMI stack.
      
      Change the soft-NMI to use the "emergency stack". The current kernel
      stack is not used, because of the longer-term goal to prevent
      asynchronous stack access using soft-disable.
      
      Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc491f1d
    • H
      parisc: Increase thread and stack size to 32kb · 8f8201df
      Helge Deller 提交于
      Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
      the default size of 16k. The reason why stack usage suddenly grew is yet
      unknown.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.11+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      8f8201df
    • J
      parisc: Handle vma's whose context is not current in flush_cache_range · 13d57093
      John David Anglin 提交于
      In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
      statement in flush_cache_range() during a system shutdown:
      
      kernel BUG at arch/parisc/kernel/cache.c:595!
      CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
      Workqueue: events free_ioctx
      
       IAOQ[0]: flush_cache_range+0x144/0x148
       IAOQ[1]: flush_cache_page+0x0/0x1a8
       RP(r2): flush_cache_range+0xec/0x148
      Backtrace:
       [<00000000402910ac>] unmap_page_range+0x84/0x880
       [<00000000402918f4>] unmap_single_vma+0x4c/0x60
       [<0000000040291a18>] zap_page_range_single+0x110/0x160
       [<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
       [<000000004026ccd8>] truncate_pagecache+0x50/0xa8
       [<000000004026cd84>] truncate_setsize+0x54/0x70
       [<000000004033d534>] put_aio_ring_file+0x44/0xb0
       [<000000004033d5d8>] aio_free_ring+0x38/0x140
       [<000000004033d714>] free_ioctx+0x34/0xa8
       [<00000000401b0028>] process_one_work+0x1b8/0x4d0
       [<00000000401b04f4>] worker_thread+0x1b4/0x648
       [<00000000401b9128>] kthread+0x1b0/0x208
       [<0000000040150020>] end_fault_vector+0x20/0x28
       [<0000000040639518>] nf_ip_reroute+0x50/0xa8
       [<0000000040638ed0>] nf_ip_route+0x10/0x78
       [<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8
      
      CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
      Workqueue: events free_ioctx
      Backtrace:
       [<0000000040163bf0>] show_stack+0x20/0x38
       [<0000000040688480>] dump_stack+0xa8/0x120
       [<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
       [<0000000040164d0c>] handle_interruption+0xa24/0xa48
      
      This patch modifies flush_cache_range() to handle non current contexts.
      In as much as this occurs infrequently, the simplest approach is to
      flush the entire cache when this happens.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      13d57093
  5. 30 7月, 2017 1 次提交
    • R
      cpufreq: x86: Make scaling_cur_freq behave more as expected · 4815d3c5
      Rafael J. Wysocki 提交于
      After commit f8475cef "x86: use common aperfmperf_khz_on_cpu() to
      calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
      in sysfs only behaves as expected on x86 with APERF/MPERF registers
      available when it is read from at least twice in a row.  The value
      returned by the first read may not be meaningful, because the
      computations in there use cached values from the previous iteration
      of aperfmperf_snapshot_khz() which may be stale.
      
      To prevent that from happening, modify arch_freq_get_on_cpu() to
      call aperfmperf_snapshot_khz() twice, with a short delay between
      these calls, if the previous invocation of aperfmperf_snapshot_khz()
      was too far back in the past (specifically, more that 1s ago).
      
      Also, as pointed out by Doug Smythies, aperf_delta is limited now
      and the multiplication of it by cpu_khz won't overflow, so simplify
      the s->khz computations too.
      
      Fixes: f8475cef "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF"
      Reported-by: NDoug Smythies <dsmythies@telus.net>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4815d3c5
  6. 28 7月, 2017 7 次提交
    • A
      powerpc/powernv/pci: Return failure for some uses of dma_set_mask() · 253fd51e
      Alistair Popple 提交于
      Commit 8e3f1b1d ("powerpc/powernv/pci: Enable 64-bit devices to access
      >4GB DMA space") introduced the ability for PCI device drivers to request a
      DMA mask between 64 and 32 bits and actually get a mask greater than
      32-bits. However currently if certain machine configuration dependent
      conditions are not meet the code silently falls back to a 32-bit mask.
      
      This makes it hard for device drivers to detect which mask they actually
      got. Instead we should return an error when the request could not be
      fulfilled which allows drivers to either fallback or implement other
      workarounds as documented in DMA-API-HOWTO.txt.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      253fd51e
    • M
      powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler · 65c5ec11
      Michael Ellerman 提交于
      Historically the boot wrapper was always built 32-bit big endian, even
      for 64-bit kernels. That was because old firmwares didn't necessarily
      support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
      uses CROSS32CC for compilation.
      
      However when we added 64-bit little endian support, we also added
      support for building the boot wrapper 64-bit. However we kept using
      CROSS32CC, because in most cases it is just CC and everything works.
      
      However if the user doesn't specify CROSS32_COMPILE (which no one ever
      does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
      becomes just "gcc". On native systems that is probably OK, but if we're
      cross building it definitely isn't, leading to eg:
      
        gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
        gcc: error: unrecognized argument in option ‘-mabi=elfv2’
        gcc: error: unrecognized command line option ‘-mlittle-endian’
        make: *** [zImage] Error 2
      
      To fix it, stop using CROSS32CC, because we may or may not be building
      32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
      CROSS32_COMPILE if it's set and we're building for 32-bit.
      
      Fixes: 147c0516 ("powerpc/boot: Add support for 64bit little endian wrapper")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      65c5ec11
    • M
      powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU · 7b7622bb
      Michael Ellerman 提交于
      In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
      CPU, which means it has to run *on* the boot CPU.
      
      In the past we ensured it ran on the boot CPU by changing the CPU
      affinity mask of current directly. That was removed in commit
      6d11b87d ("powerpc/smp: Replace open coded task affinity logic"),
      and replaced with a work queue call.
      
      Unfortunately using a work queue leads to a lockdep warning, now that
      the CPU hotplug lock is a regular semaphore:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        ...
        kworker/0:1/971 is trying to acquire lock:
         (cpu_hotplug_lock.rw_sem){++++++}, at: [<c000000000100974>] apply_workqueue_attrs+0x34/0xa0
      
        but task is already holding lock:
         ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
        ...
             CPU0                    CPU1
             ----                    ----
        lock((&wfc.work));
                                     lock(cpu_hotplug_lock.rw_sem);
                                     lock((&wfc.work));
        lock(cpu_hotplug_lock.rw_sem);
      
      Although the deadlock can't happen in practice, because
      smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
      lockdep can't tell that.
      
      Luckily in commit 8fb12156 ("init: Pin init task to the boot CPU,
      initially") tglx changed the generic code to pin init to the boot CPU
      to begin with. The unpinning of init from the boot CPU happens in
      sched_init_smp(), which is called after smp_cpus_done().
      
      So smp_cpus_done() is always called on the boot CPU, which means we
      don't need the work queue call at all - and the lockdep warning goes
      away.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      7b7622bb
    • W
      arm64: mmu: Place guard page after mapping of kernel image · 92bbd16e
      Will Deacon 提交于
      The vast majority of virtual allocations in the vmalloc region are followed
      by a guard page, which can help to avoid overruning on vma into another,
      which may map a read-sensitive device.
      
      This patch adds a guard page to the end of the kernel image mapping (i.e.
      following the data/bss segments).
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      92bbd16e
    • M
      x86/boot: Disable the address-of-packed-member compiler warning · 20c6c189
      Matthias Kaehlcke 提交于
      The clang warning 'address-of-packed-member' is disabled for the general
      kernel code, also disable it for the x86 boot code.
      
      This suppresses a bunch of warnings like this when building with clang:
      
      ./arch/x86/include/asm/processor.h:535:30: warning: taking address of
        packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
        unaligned pointer value [-Waddress-of-packed-member]
          return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                      ^~~~~~~~~~~~~~~~~~~
      ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
        'this_cpu_read_stable'
          #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                          ^~~
      ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
        'percpu_stable_op'
          : "p" (&(var)));
                   ^~~
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20c6c189
    • G
      powerpc/tm: Fix saving of TM SPRs in core dump · cd63f3cf
      Gustavo Romero 提交于
      Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
      TFIAR, TEXASR) to the thread struct, unless the process is currently
      inside a suspended transaction.
      
      If the process is core dumping, and the TM SPRs have changed since the
      last time the process was context switched, then we will save stale
      values of the TM SPRs to the core dump.
      
      Fix it by saving the live register state to the thread struct in that
      case.
      
      Fixes: 08e1c01d ("powerpc/ptrace: Enable support for TM SPR state")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd63f3cf
    • O
      powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries · c9c98bc5
      Oliver O'Halloran 提交于
      The Radix MMU translation tree as defined in ISA v3.0 contains two
      different types of entry, directories and leaves. Leaves are
      identified by _PAGE_PTE being set.
      
      The formats of the two entries are different, with the directory
      entries containing no spare bits for use by software. In particular
      the bit we use for _PAGE_DEVMAP is not reserved for software, and is
      part of the NLB (Next Level Base) field, essentially the address of
      the next level in the tree.
      
      Note that the Linux pte_t is not == _PAGE_PTE. A huge page pmd
      entry (or devmap!) is also a leaf and so has _PAGE_PTE set, even
      though we use a pmd_t for it in Linux.
      
      The fix is to ensure that the pmd/pte_devmap() confirm they are
      looking at a leaf entry (_PAGE_PTE) as well as checking _PAGE_DEVMAP.
      
      Fixes: ebd31197 ("powerpc/mm: Add devmap support for ppc64")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Tested-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Add a comment in the code and flesh out change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9c98bc5
  7. 27 7月, 2017 7 次提交
    • W
      drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU · a3287c41
      Will Deacon 提交于
      Since the PMU register interface is banked per CPU, CPU PMU interrrupts
      cannot be handled by a CPU other than the one with the PMU asserting the
      interrupt. This means that migrating PMU SPIs, as we do during a CPU
      hotplug operation doesn't make any sense and can lead to the IRQ being
      disabled entirely if we route a spurious IRQ to the new affinity target.
      
      This has been observed in practice on AMD Seattle, where CPUs on the
      non-boot cluster appear to take a spurious PMU IRQ when coming online,
      which is routed to CPU0 where it cannot be handled.
      
      This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
      affinity prior to requesting them, ensuring that they cannot
      be migrated during hotplug events. This interacts badly with the DB8500
      erratum workaround that ping-pongs the interrupt affinity from the handler,
      so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
      to be overridden in the platdata.
      
      Fixes: 3cf7ee98 ("drivers/perf: arm_pmu: move irq request/free into probe")
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      a3287c41
    • A
      powerpc/mm/hash: Free the subpage_prot_table correctly · 0da12a7a
      Aneesh Kumar K.V 提交于
      Fixes: dad6f37c ("powerpc: subpage_protect: Increase the array size to take care of 64TB")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Tested-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0da12a7a
    • W
      KVM: LAPIC: Fix reentrancy issues with preempt notifiers · 1d518c68
      Wanpeng Li 提交于
      Preempt can occur in the preemption timer expiration handler:
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            hv_timer_is_use == true
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   kvm_lapic_restart_hv_timer
                                     restart_apic_timer
                                       start_hv_timer
                                         already-expired timer or sw timer triggerd in the window
                                       start_sw_timer
                                         cancel_hv_timer
                                 /* back in kvm_lapic_expired_hv_timer */
                                 cancel_hv_timer
                                   WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
      
      This can be reproduced if CONFIG_PREEMPT is enabled.
      
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ #16
       RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
      Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
       ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
       CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ #16
       RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
      Call Trace:
        kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xb8/0xd70 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
        ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x81/0x220
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by making the caller of cancel_hv_timer, start_hv_timer
      and start_sw_timer be in preemption-disabled regions, which trivially
      avoid any reentrancy issue with preempt notifier.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Add more WARNs. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d518c68
    • W
      KVM: nVMX: Fix loss of L2's NMI blocking state · 2d6144e3
      Wanpeng Li 提交于
      Run kvm-unit-tests/eventinj.flat in L1 w/ ept=0 on both L0 and L1:
      
      Before NMI IRET test
      Sending NMI to self
      NMI isr running stack 0x461000
      Sending nested NMI to self
      After nested NMI to self
      Nested NMI isr running rip=40038e
      After iret
      After NMI to self
      FAIL: NMI
      
      Commit 4c4a6f79 (KVM: nVMX: track NMI blocking state separately
      for each VMCS) tracks NMI blocking state separately for vmcs01 and
      vmcs02. However it is not enough:
      
       - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
         on IRET, so the L2 can generate #PF which can be intercepted by L0.
       - L0 walks L1's guest page table and sees the mapping is invalid, it
         resumes the L1 guest and injects the #PF into L1.  At this point the
         vmcs02 has nmi_known_unmasked=true.
       - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
         of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
       - L1 executes VMRESUME to resume L2, causing a vmexit to L0
       - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
         interruptibility-state field of vmcs02, but nmi_known_unmasked is
         still true.
       - L2 immediately exits to L0 with another page fault, because L0 still has
         not updated the NGVA->HPA page tables.  However, nmi_known_unmasked is
         true so vmx_recover_nmi_blocking does not do anything.
      
      The fix is to update nmi_known_unmasked when preparing vmcs02 from vmcs12.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2d6144e3
    • W
      KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode · 06a5524f
      Wincy Van 提交于
      The PI vector for L0 and L1 must be different. If dest vcpu0
      is in guest mode while vcpu1 is delivering a non-nested PI to
      vcpu0, there wont't be any vmexit so that the non-nested interrupt
      will be delayed.
      Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      06a5524f
    • W
      x86: irq: Define a global vector for nested posted interrupts · 210f84b0
      Wincy Van 提交于
      We are using the same vector for nested/non-nested posted
      interrupts delivery, this may cause interrupts latency in
      L1 since we can't kick the L2 vcpu out of vmx-nonroot mode.
      
      This patch introduces a new vector which is only for nested
      posted interrupts to solve the problems above.
      Signed-off-by: NWincy Van <fanwenyi0529@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      210f84b0
    • P
      KVM: x86: do mask out upper bits of PAE CR3 · a512177e
      Paolo Bonzini 提交于
      This reverts the change of commit f85c758d,
      as the behavior it modified was intended.
      
      The VM is running in 32-bit PAE mode, and Table 4-7 of the Intel manual
      says:
      
      Table 4-7. Use of CR3 with PAE Paging
      Bit Position(s)	Contents
      4:0		Ignored
      31:5		Physical address of the 32-Byte aligned
      		page-directory-pointer table used for linear-address
      		translation
      63:32		Ignored (these bits exist only on processors supporting
      		the Intel-64 architecture)
      
      To placate the static checker, write the mask explicitly as an
      unsigned long constant instead of using a 32-bit unsigned constant.
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Fixes: f85c758dSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a512177e
  8. 26 7月, 2017 10 次提交
    • D
      arm64: sysreg: Fix unprotected macro argmuent in write_sysreg · d0153c7f
      Dave Martin 提交于
      write_sysreg() may misparse the value argument because it is used
      without parentheses to protect it.
      
      This patch adds the ( ) in order to avoid any surprises.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      [will: same change to write_sysreg_s]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d0153c7f
    • M
      powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain · b40b2386
      Michael Ellerman 提交于
      In commit efe0160c ("powerpc/64: Linker on-demand sfpr functions
      for modules"), we added an ld version check early in the powerpc
      top-level Makefile.
      
      Because the Makefile runs before the kernel config is setup, the
      checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
      we end up configuring ld for 32-bit big endian.
      
      That would be OK, except that for historical (or perhaps no) reason,
      we use 'override LD' to add the endian flags to the LD variable
      itself, rather than the normal approach of adding them to LDFLAGS.
      
      The end result is that when we check the ld version we run it as:
      
        $(CROSS_COMPILE)ld -EB -m elf32ppc --version
      
      This often works, unless you are using a 64-bit only and/or little
      endian only, toolchain. In which case you see something like:
      
        $ make defconfig
        powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
        Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
        /bin/sh: 1: [: -ge: unexpected operator
      
      The proper fix is to stop using 'override LD', but that will require a
      fair bit of testing. Instead we can fix it for now just by reordering
      the Makefile to do the version check earlier.
      
      Fixes: efe0160c ("powerpc/64: Linker on-demand sfpr functions for modules")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b40b2386
    • L
      powerpc/pseries: Fix of_node_put() underflow during reconfig remove · 4fd1bd44
      Laurent Vivier 提交于
      As for commit 68baf692 ("powerpc/pseries: Fix of_node_put()
      underflow during DLPAR remove"), the call to of_node_put() must be
      removed from pSeries_reconfig_remove_node().
      
      dlpar_detach_node() and pSeries_reconfig_remove_node() both call
      of_detach_node(), and thus the node should not be released in both
      cases.
      
      Fixes: 0829f6d1 ("of: device_node kobject lifecycle fixes")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4fd1bd44
    • B
      powerpc/mm/radix: Workaround prefetch issue with KVM · a25bd72b
      Benjamin Herrenschmidt 提交于
      There's a somewhat architectural issue with Radix MMU and KVM.
      
      When coming out of a guest with AIL (Alternate Interrupt Location, ie,
      MMU enabled), we start executing hypervisor code with the PID register
      still containing whatever the guest has been using.
      
      The problem is that the CPU can (and will) then start prefetching or
      speculatively load from whatever host context has that same PID (if
      any), thus bringing translations for that context into the TLB, which
      Linux doesn't know about.
      
      This can cause stale translations and subsequent crashes.
      
      Fixing this in a way that is neither racy nor a huge performance
      impact is difficult. We could just make the host invalidations always
      use broadcast forms but that would hurt single threaded programs for
      example.
      
      We chose to fix it instead by partitioning the PID space between guest
      and host. This is possible because today Linux only use 19 out of the
      20 bits of PID space, so existing guests will work if we make the host
      use the top half of the 20 bits space.
      
      We additionally add support for a property to indicate to Linux the
      size of the PID register which will be useful if we eventually have
      processors with a larger PID space available.
      
      There is still an issue with malicious guests purposefully setting the
      PID register to a value in the hosts PID range. Hopefully future HW
      can prevent that, but in the meantime, we handle it with a pair of
      kludges:
      
       - On the way out of a guest, before we clear the current VCPU in the
         PACA, we check the PID and if it's outside of the permitted range
         we flush the TLB for that PID.
      
       - When context switching, if the mm is "new" on that CPU (the
         corresponding bit was set for the first time in the mm cpumask), we
         check if any sibling thread is in KVM (has a non-NULL VCPU pointer
         in the PACA). If that is the case, we also flush the PID for that
         CPU (core).
      
      This second part is needed to handle the case where a process is
      migrated (or starts a new pthread) on a sibling thread of the CPU
      coming out of KVM, as there's a window where stale translations can
      exist before we detect it and flush them out.
      
      A future optimization could be added by keeping track of whether the
      PID has ever been used and avoid doing that for completely fresh PIDs.
      We could similarily mark PIDs that have been the subject of a global
      invalidation as "fresh". But for now this will do.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
            unneeded include of kvm_book3s_asm.h]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a25bd72b
    • J
      parisc: Extend disabled preemption in copy_user_page · 56008c04
      John David Anglin 提交于
      It's always bothered me that we only disable preemption in
      copy_user_page around the call to flush_dcache_page_asm.
      This patch extends this to after the copy.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      56008c04
    • J
      parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases · ae7a609c
      John David Anglin 提交于
      Helge noticed that we flush the TLB page in flush_cache_page but not in
      flush_cache_range or flush_cache_mm.
      
      For a long time, we have had random segmentation faults building
      packages on machines with PA8800/8900 processors.  These machines only
      support equivalent aliases.  We don't see these faults on machines that
      don't require strict coherency.  So, it appears TLB speculation
      sometimes leads to cache corruption on machines that require coherency.
      
      This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
      coherency is required.  We only flush the TLB in flush_cache_page when
      coherency is required.
      
      The patch also optimizes flush_cache_range.  It turns out we always have
      the right context to use flush_user_dcache_range_asm and
      flush_user_icache_range_asm.
      
      The patch has been tested for some time on rp3440, rp3410 and A500-44.
      It's been boot tested on c8000.  No random segmentation faults were
      observed during testing.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      ae7a609c
    • H
      parisc: Suspend lockup detectors before system halt · 56188832
      Helge Deller 提交于
      Some machines can't power off the machine, so disable the lockup detectors to
      avoid this watchdog BUG to show up every few seconds:
      watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.9+
      56188832
    • H
      parisc: Show DIMM slot number which holds broken memory module · c46bafc4
      Helge Deller 提交于
      The Page Deallocation Table (PDT) holds the physical addresses of all broken
      memory addresses. With the physical address we now are able to show which DIMM
      slot (e.g. 1a, 3c) actually holds the broken memory module so that users are
      able to replace it.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      c46bafc4
    • H
      parisc: Add function to return DIMM slot of physical address · 25a9b765
      Helge Deller 提交于
      Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a
      physical address. This is needed to show users which DIMM module needs
      replacement in case a broken DIMM was encountered.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      25a9b765
    • H
      parisc: Fix crash when calling PDC_PAT_MEM PDT firmware function · f520e552
      Helge Deller 提交于
      Commit c9c2877d ("parisc: Add Page Deallocation Table (PDT) support")
      introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which
      crashed the system because it trashed the stack if the
      pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is
      in size less than the required 32 64-bit values).
      
      Fix it by using the pdc_result struct instead when calling firmware and copy
      the return values back into the result struct when finished sucessfully.
      
      While debugging this code I noticed that the pdc_type wasn't set correctly
      either, so let's fix that too.
      
      Fixes: c9c2877d ("parisc: Add Page Deallocation Table (PDT) support")
      Signed-off-by: NHelge Deller <deller@gmx.de>
      f520e552
  9. 25 7月, 2017 4 次提交
    • C
      KVM: s390: take srcu lock when getting/setting storage keys · 4f899147
      Christian Borntraeger 提交于
      The following warning was triggered by missing srcu locks around
      the storage key handling functions.
      
      =============================
      WARNING: suspicious RCU usage
      4.12.0+ #56 Not tainted
      -----------------------------
      ./include/linux/kvm_host.h:572 suspicious rcu_dereference_check() usage!
      rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by live_migration/4936:
        #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
      kvm_arch_vm_ioctl+0x6b8/0x22d0
      
       CPU: 8 PID: 4936 Comm: live_migration Not tainted 4.12.0+ #56
       Hardware name: IBM 2964 NC9 704 (LPAR)
       Call Trace:
       ([<000000000011378a>] show_stack+0xea/0xf0)
        [<000000000055cc4c>] dump_stack+0x94/0xd8
        [<000000000012ee70>] gfn_to_memslot+0x1a0/0x1b8
        [<0000000000130b76>] gfn_to_hva+0x2e/0x48
        [<0000000000141c3c>] kvm_arch_vm_ioctl+0x714/0x22d0
        [<000000000013306c>] kvm_vm_ioctl+0x11c/0x7b8
        [<000000000037e2c0>] do_vfs_ioctl+0xa8/0x6c8
        [<000000000037e984>] SyS_ioctl+0xa4/0xb8
        [<00000000008b20a4>] system_call+0xc4/0x27c
       1 lock held by live_migration/4936:
        #0:  (&mm->mmap_sem){++++++}, at: [<0000000000141be0>]
      kvm_arch_vm_ioctl+0x6b8/0x22d0
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
      4f899147
    • S
      x86/efi: Fix reboot_mode when EFI runtime services are disabled · 4ecf7191
      Stefan Assmann 提交于
      When EFI runtime services are disabled, for example by the "noefi"
      kernel cmdline parameter, the reboot_type could still be set to
      BOOT_EFI causing reboot to fail.
      
      Fix this by checking if EFI runtime services are enabled.
      Signed-off-by: NStefan Assmann <sassmann@kpanic.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170724122248.24006-1-sassmann@kpanic.de
      [ Fixed 'not disabled' double negation. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4ecf7191
    • M
      x86/boot: #undef memcpy() et al in string.c · 18d5e6c3
      Michael Davidson 提交于
      undef memcpy() and friends in boot/string.c so that the functions
      defined here will have the correct names, otherwise we end up
      up trying to redefine __builtin_memcpy() etc.
      
      Surprisingly, GCC allows this (and, helpfully, discards the
      __builtin_ prefix from the function name when compiling it),
      but clang does not.
      
      Adding these #undef's appears to preserve what I assume was
      the original intent of the code.
      Signed-off-by: NMichael Davidson <md@google.com>
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Bernhard.Rosenkranzer@linaro.org
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170724235155.79255-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18d5e6c3
    • A
      arm64/lib: copy_page: use consistent prefetch stride · 288be97c
      Ard Biesheuvel 提交于
      The optional prefetch instructions in the copy_page() routine are
      inconsistent: at the start of the function, two cachelines are
      prefetched beyond the one being loaded in the first iteration, but
      in the loop, the prefetch is one more line ahead. This appears to
      be unintentional, so let's fix it.
      
      While at it, fix the comment style and white space.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      288be97c
  10. 24 7月, 2017 2 次提交
    • D
      ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[] · ce184a0d
      Dave Martin 提交于
      In kernels with CONFIG_IWMMXT=y running on non-iWMMXt hardware, the
      signal frame can be left partially uninitialised in such a way
      that userspace cannot parse uc_regspace[] safely.  In particular,
      this means that the VFP registers cannot be located reliably in the
      signal frame when a multi_v7_defconfig kernel is run on the
      majority of platforms.
      
      The cause is that the uc_regspace[] is laid out statically based on
      the kernel config, but the decision of whether to save/restore the
      iWMMXt registers must be a runtime decision.
      
      To minimise breakage of software that may assume a fixed layout,
      this patch emits a dummy block of the same size as iwmmxt_sigframe,
      for non-iWMMXt threads.  However, the magic and size of this block
      are now filled in to help parsers skip over it.  A new DUMMY_MAGIC
      is defined for this purpose.
      
      It is probably legitimate (if non-portable) for userspace to
      manufacture its own sigframe for sigreturn, and there is no obvious
      reason why userspace should be required to insert a DUMMY_MAGIC
      block when running on non-iWMMXt hardware, when omitting it has
      worked just fine forever in other configurations.  So in this case,
      sigreturn does not require this block to be present.
      Reported-by: NEdmund Grimley-Evans <Edmund.Grimley-Evans@arm.com>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      ce184a0d
    • D
      ARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors · 26958355
      Dave Martin 提交于
      preserve_iwmmxt_context() and restore_iwmmxt_context() lack __user
      accessors on their arguments pointing to the user signal frame.
      
      There does not be appear to be a bug here, but this omission is
      inconsistent with the crunch and vfp sigframe access functions.
      
      This patch adds the annotations, for consistency.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      26958355