1. 01 7月, 2017 2 次提交
  2. 30 6月, 2017 8 次提交
    • B
      x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug · 8eabf42a
      Baoquan He 提交于
      Kernel text KASLR is separated into physical address and virtual
      address randomization. And for virtual address randomization, we
      only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
      So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
      but not the original kernel loading address 'output'.
      
      The bug will cause kernel boot failure if kernel is loaded at a different
      position than the address, 16M, which is decided at compiled time.
      Kexec/kdump is such practical case.
      
      To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
      value.
      Tested-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8391c73c ("x86/KASLR: Randomize virtual address separately")
      Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8eabf42a
    • B
      x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization · b892cb87
      Baoquan He 提交于
      For kernel text KASLR, the virtual address is confined to area of 1G,
      [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
      virtual address randomization, we only randomize to get an offset
      between 16M and 1G, then add this offset to the starting address,
      0xffffffff80000000. Here 16M is the offset which is decided at linking
      stage. So the amount of the local variable 'virt_addr' which respresents
      the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.
      
      Add a debug check for the offset. If out of bounds, print error
      message and hang there.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b892cb87
    • J
      MIPS: Avoid accidental raw backtrace · 85423636
      James Hogan 提交于
      Since commit 81a76d71 ("MIPS: Avoid using unwind_stack() with
      usermode") show_backtrace() invokes the raw backtracer when
      cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels
      where user and kernel address spaces overlap.
      
      However this is used by show_stack() which creates its own pt_regs on
      the stack and leaves cp0_status uninitialised in most of the code paths.
      This results in the non deterministic use of the raw back tracer
      depending on the previous stack content.
      
      show_stack() deals exclusively with kernel mode stacks anyway, so
      explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure
      we get a useful backtrace.
      
      Fixes: 81a76d71 ("MIPS: Avoid using unwind_stack() with usermode")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.15+
      Patchwork: https://patchwork.linux-mips.org/patch/16656/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      85423636
    • P
      MIPS: Perform post-DMA cache flushes on systems with MAARs · cad482c1
      Paul Burton 提交于
      Recent CPUs from Imagination Technologies such as the I6400 or P6600 are
      able to speculatively fetch data from memory into caches. This means
      that if used in a system with non-coherent DMA they require that caches
      be invalidated after a device performs DMA, and before the CPU reads the
      DMA'd data, in order to ensure that stale values weren't speculatively
      prefetched.
      
      Such CPUs also introduced Memory Accessibility Attribute Registers
      (MAARs) in order to control the regions in which they are allowed to
      speculate. Thus we can use the presence of MAARs as a good indication
      that the CPU requires the above cache maintenance. Use the presence of
      MAARs to determine the result of cpu_needs_post_dma_flush() in the
      default case, in order to handle these recent CPUs correctly.
      
      Note that the return type of cpu_needs_post_dma_flush() is changed to
      bool, such that it's clearer what's happening when cpu_has_maar is cast
      to bool for the return value. If this patch were backported to a
      pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int
      we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is
      currently 1ull<<30, so when truncated to an int gives a non-zero value
      anyway, but even so the implicit conversion from long long int to bool
      makes it clearer to understand what will happen than the implicit
      conversion from long long int to int would. The bool return type also
      fits this usage better semantically, so seems like an all-round win.
      
      Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting
      the return type change.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Ed Blake <ed.blake@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16363/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      cad482c1
    • P
      MIPS: Fix IRQ tracing & lockdep when rescheduling · d8550860
      Paul Burton 提交于
      When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
      from arch/mips/kernel/entry.S we disable interrupts. This is true
      regardless of whether we reach work_resched from syscall_exit_work,
      resume_userspace or by looping after calling schedule(). Although we
      disable interrupts in these paths we don't call trace_hardirqs_off()
      before calling into C code which may acquire locks, and we therefore
      leave lockdep with an inconsistent view of whether interrupts are
      disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are
      both enabled.
      
      Without tracing this interrupt state lockdep will print warnings such
      as the following once a task returns from a syscall via
      syscall_exit_partial with TIF_NEED_RESCHED set:
      
      [   49.927678] ------------[ cut here ]------------
      [   49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
      [   49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [   49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
      [   49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
      [   49.974431]         0000000000000000 0000000000000000 0000000000000000 000000000000004a
      [   49.985300]         ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
      [   49.996194]         0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
      [   50.007063]         000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
      [   50.017945]         0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
      [   50.028827]         0000000000000000 0000000000000001 0000000000000000 0000000000000000
      [   50.039688]         0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
      [   50.050575]         00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
      [   50.061448]         0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
      [   50.072327]         ...
      [   50.076087] Call Trace:
      [   50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
      [   50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
      [   50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
      [   50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
      [   50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
      [   50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
      [   50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
      [   50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
      [   50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
      [   50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
      [   50.148405] possible reason: unannotated irqs-off.
      [   50.154600] irq event stamp: 400463
      [   50.159566] hardirqs last  enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
      [   50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
      [   50.183897] softirqs last  enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
      [   50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128
      
      Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
      when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
      schedule() following the work_resched label because:
      
       1) Interrupts are disabled regardless of the path we take to reach
          work_resched() & schedule().
      
       2) Performing the tracing here avoids the need to do it in paths which
          disable interrupts but don't call out to C code before hitting a
          path which uses the RESTORE_SOME macro that will call
          trace_hardirqs_on() or trace_hardirqs_off() as appropriate.
      
      We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
      syscall_trace_leave() for similar reasons, ensuring that lockdep has a
      consistent view of state after we re-enable interrupts.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org>
      Patchwork: https://patchwork.linux-mips.org/patch/15385/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      d8550860
    • P
      MIPS: pm-cps: Drop manual cache-line alignment of ready_count · 161c51cc
      Paul Burton 提交于
      We allocate memory for a ready_count variable per-CPU, which is accessed
      via a cached non-coherent TLB mapping to perform synchronisation between
      threads within the core using LL/SC instructions. In order to ensure
      that the variable is contained within its own data cache line we
      allocate 2 lines worth of memory & align the resulting pointer to a line
      boundary. This is however unnecessary, since kmalloc is guaranteed to
      return memory which is at least cache-line aligned (see
      ARCH_DMA_MINALIGN). Stop the redundant manual alignment.
      
      Besides cleaning up the code & avoiding needless work, this has the side
      effect of avoiding an arithmetic error found by Bryan on 64 bit systems
      due to the 32 bit size of the former dlinesz. This led the ready_count
      variable to have its upper 32b cleared erroneously for MIPS64 kernels,
      causing problems when ready_count was later used on MIPS64 via cpuidle.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 3179d37e ("MIPS: pm-cps: add PM state entry code for CPS systems")
      Reported-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Tested-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org> # v3.16+
      Patchwork: https://patchwork.linux-mips.org/patch/15383/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      161c51cc
    • D
      ARM: 8685/1: ensure memblock-limit is pmd-aligned · 9e25ebfe
      Doug Berger 提交于
      The pmd containing memblock_limit is cleared by prepare_page_table()
      which creates the opportunity for early_alloc() to allocate unmapped
      memory if memblock_limit is not pmd aligned causing a boot-time hang.
      
      Commit 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      attempted to resolve this problem, but there is a path through the
      adjust_lowmem_bounds() routine where if all memory regions start and
      end on pmd-aligned addresses the memblock_limit will be set to
      arm_lowmem_limit.
      
      Since arm_lowmem_limit can be affected by the vmalloc early parameter,
      the value of arm_lowmem_limit may not be pmd-aligned. This commit
      corrects this oversight such that memblock_limit is always rounded
      down to pmd-alignment.
      
      Fixes: 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      9e25ebfe
    • K
      perf/x86/intel/uncore: Fix wrong box pointer check · 80c65fdb
      Kan Liang 提交于
      Should not init a NULL box. It will cause system crash.
      The issue looks like caused by a typo.
      
      This was not noticed because there is no NULL box. Also, for most
      boxes, they are enabled by default. The init code is not critical.
      
      Fixes: fff4b87e ("perf/x86/intel/uncore: Make package handling more robust")
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
      80c65fdb
  3. 29 6月, 2017 1 次提交
  4. 28 6月, 2017 2 次提交
  5. 26 6月, 2017 2 次提交
    • M
      powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user() · d6bd8194
      Michael Ellerman 提交于
      Larry Finger reported that his Powerbook G4 was no longer booting with v4.12-rc,
      userspace was up but giving weird errors such as:
      
        udevd[64]: starting version 175
        udevd[64]: Unable to receive ctrl message: Bad address.
        modprobe: chdir(4.12-rc1): No such file or directory
      
      He bisected the problem to commit 3448890c ("powerpc: get rid of zeroing,
      switch to RAW_COPY_USER").
      
      Al identified that the problem is actually a miscompilation by GCC 4.6.3, which
      is exposed by the above commit.
      
      Al also pointed out that inlining copy_to/from_user() is probably of little or
      no benefit, which is correct. Using Anton's copy_to_user benchmark, with a
      pathological single byte copy, we see a small increase in performance
      by *removing* inlining:
      
        Before (inlined):
        # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
        real	0m22.063s
        real	0m22.059s
        real	0m22.076s
      
        After:
        # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
        real	0m21.325s
        real	0m21.299s
        real	0m21.364s
      
      So as a small performance improvement and to avoid the miscompilation, drop
      inlining copy_to/from_user() on 32-bit.
      
      Fixes: 3448890c ("powerpc: get rid of zeroing, switch to RAW_COPY_USER")
      Reported-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d6bd8194
    • J
      x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD · 98fe3633
      Jérôme Glisse 提交于
      Since commit:
      
        af2cf278 ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()")
      
      we no longer free PUDs so that we do not have to synchronize
      all PGDs on hot-remove/vfree().
      
      But the new 5-level page table patchset reverted that for 4-level
      page tables, in the following commit:
      
        f2a6a705: ("x86: Convert the rest of the code to support p4d_t")
      
      This patch restores the damage and disables free_pud() if we are in the
      4-level page table case, thus avoiding BUG_ON() after hot-remove.
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      [ Clarified the changelog and the code comments. ]
      Reviewed-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170624180514.3821-1-jglisse@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98fe3633
  6. 24 6月, 2017 1 次提交
  7. 23 6月, 2017 2 次提交
    • N
      powerpc/64: Initialise thread_info for emergency stacks · 34f19ff1
      Nicholas Piggin 提交于
      Emergency stacks have their thread_info mostly uninitialised, which in
      particular means garbage preempt_count values.
      
      Emergency stack code runs with interrupts disabled entirely, and is
      used very rarely, so this has been unnoticed so far. It was found by a
      proposed new powerpc watchdog that takes a soft-NMI directly from the
      masked_interrupt handler and using the emergency stack. That crashed
      at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
      garbage.
      
      To fix this, zero the entire THREAD_SIZE allocation, and initialize
      the thread_info.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Move it all into setup_64.c, use a function not a macro. Fix
            crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34f19ff1
    • K
      gcc-plugins: Add the randstruct plugin · 313dd1b6
      Kees Cook 提交于
      This randstruct plugin is modified from Brad Spengler/PaX Team's code
      in the last public patch of grsecurity/PaX based on my understanding
      of the code. Changes or omissions from the original code are mine and
      don't reflect the original grsecurity/PaX code.
      
      The randstruct GCC plugin randomizes the layout of selected structures
      at compile time, as a probabilistic defense against attacks that need to
      know the layout of structures within the kernel. This is most useful for
      "in-house" kernel builds where neither the randomization seed nor other
      build artifacts are made available to an attacker. While less useful for
      distribution kernels (where the randomization seed must be exposed for
      third party kernel module builds), it still has some value there since now
      all kernel builds would need to be tracked by an attacker.
      
      In more performance sensitive scenarios, GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
      can be selected to make a best effort to restrict randomization to
      cacheline-sized groups of elements, and will not randomize bitfields. This
      comes at the cost of reduced randomization.
      
      Two annotations are defined,__randomize_layout and __no_randomize_layout,
      which respectively tell the plugin to either randomize or not to
      randomize instances of the struct in question. Follow-on patches enable
      the auto-detection logic for selecting structures for randomization
      that contain only function pointers. It is disabled here to assist with
      bisection.
      
      Since any randomized structs must be initialized using designated
      initializers, __randomize_layout includes the __designated_init annotation
      even when the plugin is disabled so that all builds will require
      the needed initialization. (With the plugin enabled, annotations for
      automatically chosen structures are marked as well.)
      
      The main differences between this implemenation and grsecurity are:
      - disable automatic struct selection (to be enabled in follow-up patch)
      - add designated_init attribute at runtime and for manual marking
      - clarify debugging output to differentiate bad cast warnings
      - add whitelisting infrastructure
      - support gcc 7's DECL_ALIGN and DECL_MODE changes (Laura Abbott)
      - raise minimum required GCC version to 4.7
      
      Earlier versions of this patch series were ported by Michael Leibowitz.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      313dd1b6
  8. 22 6月, 2017 4 次提交
    • P
      KVM: x86: fix singlestepping over syscall · c8401dda
      Paolo Bonzini 提交于
      TF is handled a bit differently for syscall and sysret, compared
      to the other instructions: TF is checked after the instruction completes,
      so that the OS can disable #DB at a syscall by adding TF to FMASK.
      When the sysret is executed the #DB is taken "as if" the syscall insn
      just completed.
      
      KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
      Fix the behavior, otherwise you could get #DB on a user stack which is not
      nice.  This does not affect Linux guests, as they use an IST or task gate
      for #DB.
      
      This fixes CVE-2017-7518.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c8401dda
    • A
      powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD · bbd5ff50
      Alistair Popple 提交于
      NPU2 requires an extra explicit flush to an active GPU PID when
      sending address translation shoot downs (ATSDs) to reliably flush the
      GPU TLB. This patch adds just such a flush at the end of each sequence
      of ATSDs.
      
      We can safely use PID 0 which is always reserved and active on the
      GPU. PID 0 is only used for init_mm which will never be a user mm on
      the GPU. To enforce this we add a check in pnv_npu2_init_context()
      just in case someone tries to use PID 0 on the GPU.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Use true/false for bool literals]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bbd5ff50
    • H
      KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows · addb63c1
      Heiko Carstens 提交于
      For real-space designation asces the asce origin part is only a token.
      The asce token origin must not be used to generate an effective
      address for storage references. This however is erroneously done
      within kvm_s390_shadow_tables().
      
      Furthermore within the same function the wrong parts of virtual
      addresses are used to generate a corresponding real address
      (e.g. the region second index is used as region first index).
      
      Both of the above can result in incorrect address translations. Only
      for real space designations with a token origin of zero and addresses
      below one megabyte the translation was correct.
      
      Furthermore replace a "!asce.r" statement with a "!*fake" statement to
      make it more obvious that a specific condition has nothing to do with
      the architecture, but with the fake handling of real space designations.
      
      Fixes: 3218f709 ("s390/mm: support real-space for gmap shadows")
      Cc: David Hildenbrand <david@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      addb63c1
    • K
      perf/x86/intel: Add 1G DTLB load/store miss support for SKL · fb3a5055
      Kan Liang 提交于
      Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
      4M page size.
      Need to extend the events to support any page size (4K/2M/4M/1G).
      
      The complete DTLB load/store miss events are:
      
        DTLB_LOAD_MISSES.WALK_COMPLETED		0xe08
        DTLB_STORE_MISSES.WALK_COMPLETED		0xe49
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb3a5055
  9. 20 6月, 2017 3 次提交
    • J
      KVM: MIPS: Fix maybe-uninitialized build failure · e27a9eca
      James Cowgill 提交于
      This commit fixes a "maybe-uninitialized" build failure in
      arch/mips/kvm/tlb.c when KVM, DYNAMIC_DEBUG and JUMP_LABEL are all
      enabled. The failure is:
      
      In file included from ./include/linux/printk.h:329:0,
                       from ./include/linux/kernel.h:13,
                       from ./include/asm-generic/bug.h:15,
                       from ./arch/mips/include/asm/bug.h:41,
                       from ./include/linux/bug.h:4,
                       from ./include/linux/thread_info.h:11,
                       from ./include/asm-generic/current.h:4,
                       from ./arch/mips/include/generated/asm/current.h:1,
                       from ./include/linux/sched.h:11,
                       from arch/mips/kvm/tlb.c:13:
      arch/mips/kvm/tlb.c: In function ‘kvm_mips_host_tlb_inv’:
      ./include/linux/dynamic_debug.h:126:3: error: ‘idx_kernel’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         __dynamic_pr_debug(&descriptor, pr_fmt(fmt), \
         ^~~~~~~~~~~~~~~~~~
      arch/mips/kvm/tlb.c:169:16: note: ‘idx_kernel’ was declared here
        int idx_user, idx_kernel;
                      ^~~~~~~~~~
      
      There is a similar error relating to "idx_user". Both errors were
      observed with GCC 6.
      
      As far as I can tell, it is impossible for either idx_user or idx_kernel
      to be uninitialized when they are later read in the calls to kvm_debug,
      but to satisfy the compiler, add zero initializers to both variables.
      Signed-off-by: NJames Cowgill <James.Cowgill@imgtec.com>
      Fixes: 57e3869c ("KVM: MIPS/TLB: Generalise host TLB invalidate to kernel ASID")
      Cc: <stable@vger.kernel.org> # 4.11+
      Acked-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e27a9eca
    • W
      arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW · dbb236c1
      Will Deacon 提交于
      Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
      49eea433 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
      clock_gettime() vDSO"). Noticing that the core timekeeping code
      never set tkr_raw.xtime_nsec, the vDSO implementation didn't
      bother exposing it via the data page and instead took the
      unshifted tk->raw_time.tv_nsec value which was then immediately
      shifted left in the vDSO code.
      
      Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
      uncovered potential 1ns time inconsistencies caused by the
      timekeeping core not handing sub-ns resolution.
      
      Now that the core code has been fixed and is actually setting
      tkr_raw.xtime_nsec, we need to take that into account in the
      vDSO by adding it to the shifted raw_time value, in order to
      fix the user-visible inconsistency. Rather than do that at each
      use (and expand the data page in the process), instead perform
      the shift/addition operation when populating the data page and
      remove the shift from the vDSO code entirely.
      
      [jstultz: minor whitespace tweak, tried to improve commit
       message to make it more clear this fixes a regression]
      Reported-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Tested-by: NDaniel Mentz <danielmentz@google.com>
      Acked-by: NKevin Brodsky <kevin.brodsky@arm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      dbb236c1
    • J
      Fix English in description of GCC_PLUGIN_STRUCTLEAK · f136e090
      Jean Delvare 提交于
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Fixes: c61f13ea ("gcc-plugins: Add structleak for more stack initialization")
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      f136e090
  10. 19 6月, 2017 1 次提交
    • H
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins 提交于
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      unfortunatelly.
      
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: NOleg Nesterov <oleg@redhat.com>
      Original-patch-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be7107f
  11. 16 6月, 2017 8 次提交
    • R
      powerpc/perf: Fix oops when kthread execs user process · bf05fc25
      Ravi Bangoria 提交于
      When a kthread calls call_usermodehelper() the steps are:
        1. allocate current->mm
        2. load_elf_binary()
        3. populate current->thread.regs
      
      While doing this, interrupts are not disabled. If there is a perf
      interrupt in the middle of this process (i.e. step 1 has completed
      but not yet reached to step 3) and if perf tries to read userspace
      regs, kernel oops with following log:
      
        Unable to handle kernel paging request for data at address 0x00000000
        Faulting instruction address: 0xc0000000000da0fc
        ...
        Call Trace:
        perf_output_sample_regs+0x6c/0xd0
        perf_output_sample+0x4e4/0x830
        perf_event_output_forward+0x64/0x90
        __perf_event_overflow+0x8c/0x1e0
        record_and_restart+0x220/0x5c0
        perf_event_interrupt+0x2d8/0x4d0
        performance_monitor_exception+0x54/0x70
        performance_monitor_common+0x158/0x160
        --- interrupt: f01 at avtab_search_node+0x150/0x1a0
            LR = avtab_search_node+0x100/0x1a0
        ...
        load_elf_binary+0x6e8/0x15a0
        search_binary_handler+0xe8/0x290
        do_execveat_common.isra.14+0x5f4/0x840
        call_usermodehelper_exec_async+0x170/0x210
        ret_from_kernel_thread+0x5c/0x7c
      
      Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
      pt_regs are not set.
      
      Fixes: ed4a4ef8 ("powerpc/perf: Add support for sampling interrupt register state")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bf05fc25
    • N
      powerpc/64s: Handle data breakpoints in Radix mode · d89ba535
      Naveen N. Rao 提交于
      On Power9, trying to use data breakpoints throws the splat shown
      below. This is because the check for a data breakpoint in DSISR is in
      do_hash_page(), which is not called when in Radix mode.
      
        Unable to handle kernel paging request for data at address 0xc000000000e19218
        Faulting instruction address: 0xc0000000001155e8
        cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
        pc: c0000000001155e8: find_pid_ns+0x48/0xe0
        lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
        sp: c0000000ef1e7da0
        msr: 9000000000009033
        dar: c000000000e19218
        dsisr: 400000
      
      Move the check to handle_page_fault() so as to catch data breakpoints
      in both Hash and Radix MMU modes.
      
      We have to change the check in do_hash_page() against 0xa410 to use
      0xa450, so as to include the value of (DSISR_DABRMATCH << 16).
      
      There are two sites that call handle_page_fault() when in Radix, both
      already pass DSISR in r4.
      
      Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NShriya R. Kulkarni <shriykul@in.ibm.com>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Fix the fall-through case on hash, we need to reload DSISR]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d89ba535
    • N
      powerpc/kprobes: Skip livepatch_handler() for jprobes · c05b8c44
      Naveen N. Rao 提交于
      ftrace_caller() depends on a modified regs->nip to detect if a certain
      function has been livepatched. However, with KPROBES_ON_FTRACE, it is
      possible for regs->nip to have been modified by the kprobes pre_handler
      (jprobes, for instance). In this case, we do not want to invoke the
      livepatch_handler so as not to consume the livepatch stack.
      
      To distinguish between the two (kprobes and livepatch), we check if
      there is an active kprobe on the current function. If there is, then we
      know for sure that it must have modified the NIP as we don't support
      livepatching a kprobe'd function. In this case, we simply skip the
      livepatch_handler and branch to the new NIP. Otherwise, the
      livepatch_handler is invoked.
      
      Fixes: ead514d5 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c05b8c44
    • N
      powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS · a4979a7e
      Naveen N. Rao 提交于
      For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set
      of registers in pt_regs, to capture the state _before_ ftrace_caller.
      However, we are instead passing the stack pointer *after* allocating a
      stack frame in ftrace_caller. Fix this by saving the proper value of r1
      in pt_regs. Also, use SAVE_10GPRS() to simplify the code.
      
      Fixes: 15308664 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4979a7e
    • N
      powerpc/kprobes: Pause function_graph tracing during jprobes handling · a9f8553e
      Naveen N. Rao 提交于
      This fixes a crash when function_graph and jprobes are used together.
      This is essentially commit 237d28db ("ftrace/jprobes/x86: Fix
      conflict between jprobes and function graph tracing"), but for powerpc.
      
      Jprobes breaks function_graph tracing since the jprobe hook needs to use
      jprobe_return(), which never returns back to the hook, but instead to
      the original jprobe'd function. The solution is to momentarily pause
      function_graph tracing before invoking the jprobe hook and re-enable it
      when returning back to the original jprobe'd function.
      
      Fixes: 6794c782 ("powerpc64: port of the function graph tracer")
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a9f8553e
    • A
      powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path · a093c92d
      Alexey Kardashevskiy 提交于
      When trapped on WARN_ON(), report_bug() is expected to return
      BUG_TRAP_TYPE_WARN so the caller will increment NIP by 4 and continue.
      The __builtin_constant_p() path of the PPC's WARN_ON()
      calls (indirectly) __WARN_FLAGS() which has BUGFLAG_WARNING set,
      however the other branch does not which makes report_bug() report a
      bug rather than a warning.
      
      Fixes: f26dee15 ("debug: Avoid setting BUGFLAG_WARNING twice")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a093c92d
    • P
      KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 · 3d3efb68
      Paul Mackerras 提交于
      POWER9 DD1 has an erratum where writing to the TBU40 register, which
      is used to apply an offset to the timebase, can cause the timebase to
      lose counts.  This results in the timebase on some CPUs getting out of
      sync with other CPUs, which then results in misbehaviour of the
      timekeeping code.
      
      To work around the problem, we make KVM ignore the timebase offset for
      all guests on POWER9 DD1 machines.  This means that live migration
      cannot be supported on POWER9 DD1 machines.
      
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3d3efb68
    • P
      KVM: PPC: Book3S HV: Save/restore host values of debug registers · 7ceaa6dc
      Paul Mackerras 提交于
      At present, HV KVM on POWER8 and POWER9 machines loses any instruction
      or data breakpoint set in the host whenever a guest is run.
      Instruction breakpoints are currently only used by xmon, but ptrace
      and the perf_event subsystem can set data breakpoints as well as xmon.
      
      To fix this, we save the host values of the debug registers (CIABR,
      DAWR and DAWRX) before entering the guest and restore them on exit.
      To provide space to save them in the stack frame, we expand the stack
      frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7ceaa6dc
  12. 15 6月, 2017 6 次提交
    • S
      iommu/vt-d: Correctly disable Intel IOMMU force on · 7304e8f2
      Shaohua Li 提交于
      I made a mistake in commit bfd20f1c. We should skip the force on with the
      option enabled instead of vice versa. Not sure why this passed our
      performance test, sorry.
      
      Fixes: bfd20f1c ('x86, iommu/vt-d: Add an option to disable Intel IOMMU force on')
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      7304e8f2
    • B
      powerpc/xive: Fix offset for store EOI MMIOs · 25642705
      Benjamin Herrenschmidt 提交于
      Architecturally we should apply a 0x400 offset for these. Not doing
      it will break future HW implementations.
      
      The offset of 0 is supposed to remain for "triggers" though not all
      sources support both trigger and store EOI, and in P9 specifically,
      some sources will treat 0 as a store EOI. But future chips will not.
      So this makes us use the properly architected offset which should work
      always.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      25642705
    • P
      MIPS: .its targets depend on vmlinux · bcd7c45e
      Paul Burton 提交于
      The .its targets require information about the kernel binary, such as
      its entry point, which is extracted from the vmlinux ELF. We therefore
      require that the ELF is built before the .its files are generated.
      Declare this requirement in the Makefile such that make will ensure this
      is always the case, otherwise in corner cases we can hit issues as the
      .its is generated with an incorrect (either invalid or stale) entry
      point.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: cf2a5e0b ("MIPS: Support generating Flattened Image Trees (.itb)")
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org> # v4.9+
      Patchwork: https://patchwork.linux-mips.org/patch/16179/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      bcd7c45e
    • P
      MIPS: Fix bnezc/jialc return address calculation · 1a73d931
      Paul Burton 提交于
      The code handling the pop76 opcode (ie. bnezc & jialc instructions) in
      __compute_return_epc_for_insn() needs to set the value of $31 in the
      jialc case, which is encoded with rs = 0. However its check to
      differentiate bnezc (rs != 0) from jialc (rs = 0) was unfortunately
      backwards, meaning that if we emulate a bnezc instruction we clobber $31
      & if we emulate a jialc instruction it actually behaves like a jic
      instruction.
      
      Fix this by inverting the check of rs to match the way the instructions
      are actually encoded.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 28d6f93d ("MIPS: Emulate the new MIPS R6 BNEZC and JIALC instructions")
      Cc: stable <stable@vger.kernel.org> # v4.0+
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16178/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      1a73d931
    • P
      KVM: PPC: Book3S HV: Preserve userspace HTM state properly · 46a704f8
      Paul Mackerras 提交于
      If userspace attempts to call the KVM_RUN ioctl when it has hardware
      transactional memory (HTM) enabled, the values that it has put in the
      HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
      guest values.  To fix this, we detect this condition and save those
      SPR values in the thread struct, and disable HTM for the task.  If
      userspace goes to access those SPRs or the HTM facility in future,
      a TM-unavailable interrupt will occur and the handler will reload
      those SPRs and re-enable HTM.
      
      If userspace has started a transaction and suspended it, we would
      currently lose the transactional state in the guest entry path and
      would almost certainly get a "TM Bad Thing" interrupt, which would
      cause the host to crash.  To avoid this, we detect this case and
      return from the KVM_RUN ioctl with an EINVAL error, with the KVM
      exit reason set to KVM_EXIT_FAIL_ENTRY.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      46a704f8
    • P
      KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · 4c3bb4cc
      Paul Mackerras 提交于
      This restores several special-purpose registers (SPRs) to sane values
      on guest exit that were missed before.
      
      TAR and VRSAVE are readable and writable by userspace, and we need to
      save and restore them to prevent the guest from potentially affecting
      userspace execution (not that TAR or VRSAVE are used by any known
      program that run uses the KVM_RUN ioctl).  We save/restore these
      in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.
      
      FSCR affects userspace execution in that it can prohibit access to
      certain facilities by userspace.  We restore it to the normal value
      for the task on exit from the KVM_RUN ioctl.
      
      IAMR is normally 0, and is restored to 0 on guest exit.  However,
      with a radix host on POWER9, it is set to a value that prevents the
      kernel from executing user-accessible memory.  On POWER9, we save
      IAMR on guest entry and restore it on guest exit to the saved value
      rather than 0.  On POWER8 we continue to set it to 0 on guest exit.
      
      PSPB is normally 0.  We restore it to 0 on guest exit to prevent
      userspace taking advantage of the guest having set it non-zero
      (which would allow userspace to set its SMT priority to high).
      
      UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
      the AMR from being used as a covert channel between userspace
      processes, since the AMR is not context-switched at present.
      
      Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
      Cc: stable@vger.kernel.org # v3.14+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4c3bb4cc