1. 29 8月, 2017 2 次提交
  2. 25 8月, 2017 5 次提交
    • P
      KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() · 47c5310a
      Paul Mackerras 提交于
      Nixiaoming pointed out that there is a memory leak in
      kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd()
      fails; the memory allocated for the kvmppc_spapr_tce_table struct
      is not freed, and nor are the pages allocated for the iommu
      tables.  In addition, we have already incremented the process's
      count of locked memory pages, and this doesn't get restored on
      error.
      
      David Hildenbrand pointed out that there is a race in that the
      function checks early on that there is not already an entry in the
      stt->iommu_tables list with the same LIOBN, but an entry with the
      same LIOBN could get added between then and when the new entry is
      added to the list.
      
      This fixes all three problems.  To simplify things, we now call
      anon_inode_getfd() before placing the new entry in the list.  The
      check for an existing entry is done while holding the kvm->lock
      mutex, immediately before adding the new entry to the list.
      Finally, on failure we now call kvmppc_account_memlimit to
      decrement the process's count of locked memory pages.
      Reported-by: NNixiaoming <nixiaoming@huawei.com>
      Reported-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      47c5310a
    • E
      x86/mm: Fix use-after-free of ldt_struct · ccd5b323
      Eric Biggers 提交于
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org> [v4.6+]
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ccd5b323
    • P
      KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 38cfd5e3
      Paolo Bonzini 提交于
      The host pkru is restored right after vcpu exit (commit 1be0e61c), so
      KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
      using the guest PKRU explicitly in fill_xsave and load_xsave.  This
      part is based on a patch by Junkang Fu.
      
      The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
      because it could have been changed by userspace since the last time
      it was saved, so skip loading it in kvm_load_guest_fpu.
      Reported-by: NJunkang Fu <junkang.fjk@alibaba-inc.com>
      Cc: Yang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      38cfd5e3
    • P
      KVM: x86: simplify handling of PKRU · b9dd21e1
      Paolo Bonzini 提交于
      Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
      simple comparison against the host value of the register.  The write of
      PKRU in addition can be skipped if the guest has not enabled the feature.
      Once we do this, we need not test OSPKE in the host anymore, because
      guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      The static PKU test is kept to elide the code on older CPUs.
      Suggested-by: NYang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b9dd21e1
    • P
      KVM: x86: block guest protection keys unless the host has them enabled · c469268c
      Paolo Bonzini 提交于
      If the host has protection keys disabled, we cannot read and write the
      guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
      the PKU cpuid bit in that case.
      
      This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      Fixes: 1be0e61c
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c469268c
  3. 24 8月, 2017 8 次提交
  4. 23 8月, 2017 6 次提交
    • A
      ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms · dbeb0c8e
      Arnd Bergmann 提交于
      My previous patch fixed a link error for all at91 platforms when
      CONFIG_ARM_CPU_SUSPEND was not set, however this caused another
      problem on a configuration that enabled CONFIG_ARCH_AT91 but none
      of the individual SoCs, and that also enabled CPU_ARM720 as
      the only CPU:
      
      warning: (ARCH_AT91 && SOC_IMX23 && SOC_IMX28 && ARCH_PXA && MACH_MVEBU_V7 && SOC_IMX6 && ARCH_OMAP3 && ARCH_OMAP4 && SOC_OMAP5 && SOC_AM33XX && SOC_DRA7XX && ARCH_EXYNOS3 && ARCH_EXYNOS4 && EXYNOS5420_MCPM && EXYNOS_CPU_SUSPEND && ARCH_VEXPRESS_TC2_PM && ARM_BIG_LITTLE_CPUIDLE && ARM_HIGHBANK_CPUIDLE && QCOM_PM) selects ARM_CPU_SUSPEND which has unmet direct dependencies (ARCH_SUSPEND_POSSIBLE)
      arch/arm/kernel/sleep.o: In function `cpu_resume':
      (.text+0xf0): undefined reference to `cpu_arm720_suspend_size'
      arch/arm/kernel/suspend.o: In function `__cpu_suspend_save':
      suspend.c:(.text+0x134): undefined reference to `cpu_arm720_do_suspend'
      
      This improves the hack some more by only selecting ARM_CPU_SUSPEND
      for the part that requires it, and changing pm.c to drop the
      contents of unused init functions so we no longer refer to
      cpu_resume on at91 platforms that don't need it.
      
      Fixes: cc7a938f ("ARM: at91: select CONFIG_ARM_CPU_SUSPEND")
      Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      dbeb0c8e
    • R
      x86/ioapic: Print the IRTE's index field correctly when enabling INTR · adfaf183
      raymond pang 提交于
      When enabling interrupt remap, IOAPIC's RTE contains the interrupt_index
      field of IRTE. This field is composed of the ->index and the ->index2 members
      of 'struct IR_IO_APIC_route_entry' - but what we print out currently only
      uses ->index.
      
      Fix it.
      Signed-off-by: NRaymond Pang <raymondpangxd@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: joro@8bytes.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/CAHG4imNDzpDyOVi7MByVrLQ%3DQFuOVqpzJ5F-Xs5z6OZphubj-Q@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      adfaf183
    • C
      arm64: kaslr: Adjust the offset to avoid Image across alignment boundary · a067d94d
      Catalin Marinas 提交于
      With 16KB pages and a kernel Image larger than 16MB, the current
      kaslr_early_init() logic for avoiding mappings across swapper table
      boundaries fails since increasing the offset by kimg_sz just moves the
      problem to the next boundary.
      
      This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the
      Image crosses a PMD_SIZE boundary.
      
      Fixes: afd0e5a8 ("arm64: kaslr: Fix up the kernel image alignment")
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      a067d94d
    • A
      arm64: kaslr: ignore modulo offset when validating virtual displacement · 4a23e56a
      Ard Biesheuvel 提交于
      In the KASLR setup routine, we ensure that the early virtual mapping
      of the kernel image does not cover more than a single table entry at
      the level above the swapper block level, so that the assembler routines
      involved in setting up this mapping can remain simple.
      
      In this calculation we add the proposed KASLR offset to the values of
      the _text and _end markers, and reject it if they would end up falling
      in different swapper table sized windows.
      
      However, when taking the addresses of _text and _end, the modulo offset
      (the physical displacement modulo 2 MB) is already accounted for, and
      so adding it again results in incorrect results. So disregard the modulo
      offset from the calculation.
      
      Fixes: 08cdac61 ("arm64: relocatable: deal with physically misaligned ...")
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4a23e56a
    • M
      arm64: mm: abort uaccess retries upon fatal signal · 289d07a2
      Mark Rutland 提交于
      When there's a fatal signal pending, arm64's do_page_fault()
      implementation returns 0. The intent is that we'll return to the
      faulting userspace instruction, delivering the signal on the way.
      
      However, if we take a fatal signal during fixing up a uaccess, this
      results in a return to the faulting kernel instruction, which will be
      instantly retried, resulting in the same fault being taken forever. As
      the task never reaches userspace, the signal is not delivered, and the
      task is left unkillable. While the task is stuck in this state, it can
      inhibit the forward progress of the system.
      
      To avoid this, we must ensure that when a fatal signal is pending, we
      apply any necessary fixup for a faulting kernel instruction. Thus we
      will return to an error path, and it is up to that code to make forward
      progress towards delivering the fatal signal.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NSteve Capper <steve.capper@arm.com>
      Tested-by: NSteve Capper <steve.capper@arm.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Tested-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      289d07a2
    • D
      arm64: fpsimd: Prevent registers leaking across exec · 09662210
      Dave Martin 提交于
      There are some tricky dependencies between the different stages of
      flushing the FPSIMD register state during exec, and these can race
      with context switch in ways that can cause the old task's regs to
      leak across.  In particular, a context switch during the memset() can
      cause some of the task's old FPSIMD registers to reappear.
      
      Disabling preemption for this small window would be no big deal for
      performance: preemption is already disabled for similar scenarios
      like updating the FPSIMD registers in sigreturn.
      
      So, instead of rearranging things in ways that might swap existing
      subtle bugs for new ones, this patch just disables preemption
      around the FPSIMD state flushing so that races of this type can't
      occur here.  This brings fpsimd_flush_thread() into line with other
      code paths.
      
      Cc: stable@vger.kernel.org
      Fixes: 674c242c ("arm64: flush FP/SIMD state correctly after execve()")
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      09662210
  5. 22 8月, 2017 1 次提交
    • T
      sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() · 2dc77533
      Thomas Petazzoni 提交于
      When building the kernel for Sparc using gcc 7.x, the build fails
      with:
      
      arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
      arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
          cmd |= PCI_COMMAND_IO;
              ^~
      
      The simplified code looks like this:
      
      unsigned int cmd;
      [...]
      pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
      [...]
      cmd |= PCI_COMMAND_IO;
      
      I.e, the code assumes that pcic_read_config() will always initialize
      cmd. But it's not the case. Looking at pcic_read_config(), if
      bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
      not be initialized.
      
      As a simple fix, we initialize cmd to zero at the beginning of
      pcibios_fixup_bus.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dc77533
  6. 21 8月, 2017 2 次提交
  7. 19 8月, 2017 2 次提交
  8. 18 8月, 2017 4 次提交
  9. 17 8月, 2017 5 次提交
  10. 16 8月, 2017 1 次提交
  11. 15 8月, 2017 2 次提交
  12. 14 8月, 2017 1 次提交
  13. 11 8月, 2017 1 次提交
    • D
      clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled · adb4f11e
      Ding Tianhong 提交于
      On platforms with an arch timer erratum workaround, it's possible for
      arch_timer_reg_read_stable() to recurse into itself when certain
      tracing options are enabled, leading to stack overflows and related
      problems.
      
      For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
      selected, it's possible to trigger this with:
      
      $ mount -t debugfs nodev /sys/kernel/debug/
      $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
      
      The problem is that in such cases, preempt_disable() instrumentation
      attempts to acquire a timestamp via trace_clock(), resulting in a call
      back to arch_timer_reg_read_stable(), and hence recursion.
      
      This patch changes arch_timer_reg_read_stable() to use
      preempt_{disable,enable}_notrace(), which avoids this.
      
      This problem is similar to the fixed by upstream commit 96b3d28b
      ("sched/clock: Prevent tracing recursion in sched_clock_cpu()").
      
      Fixes: 6acc71cc ("arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs")
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      adb4f11e