1. 16 3月, 2017 2 次提交
    • T
      x86: Remap GDT tables in the fixmap section · 69218e47
      Thomas Garnier 提交于
      Each processor holds a GDT in its per-cpu structure. The sgdt
      instruction gives the base address of the current GDT. This address can
      be used to bypass KASLR memory randomization. With another bug, an
      attacker could target other per-cpu structures or deduce the base of
      the main memory section (PAGE_OFFSET).
      
      This patch relocates the GDT table for each processor inside the
      fixmap section. The space is reserved based on number of supported
      processors.
      
      For consistency, the remapping is done by default on 32 and 64-bit.
      
      Each processor switches to its remapped GDT at the end of
      initialization. For hibernation, the main processor returns with the
      original GDT and switches back to the remapping at completion.
      
      This patch was tested on both architectures. Hibernation and KVM were
      both tested specially for their usage of the GDT.
      
      Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
      recommending changes for Xen support.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69218e47
    • T
      x86/mm: Adapt MODULES_END based on fixmap section size · f06bdd40
      Thomas Garnier 提交于
      This patch aligns MODULES_END to the beginning of the fixmap section.
      It optimizes the space available for both sections. The address is
      pre-computed based on the number of pages required by the fixmap
      section.
      
      It will allow GDT remapping in the fixmap section. The current
      MODULES_END static address does not provide enough space for the kernel
      to support a large number of processors.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com
      [ Small build fix. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f06bdd40
  2. 14 3月, 2017 7 次提交
  3. 13 3月, 2017 4 次提交
    • D
      x86/mm: Make mmap(MAP_32BIT) work correctly · 3e6ef9c8
      Dmitry Safonov 提交于
      mmap(MAP_32BIT) is broken due to the dependency on the TIF_ADDR32 thread
      flag.
      
      For 64bit applications MAP_32BIT will force legacy bottom-up allocations and
      the 1GB address space restriction even if the application issued a compat
      syscall, which should not be subject of these restrictions.
      
      For 32bit applications, which issue 64bit syscalls the newly introduced
      mmap base separation into 64-bit and compat bases changed the behaviour
      because now a 64-bit mapping is returned, but due to the TIF_ADDR32
      dependency MAP_32BIT is ignored. Before the separation a 32-bit mapping was
      returned, so the MAP_32BIT handling was irrelevant.
      
      Replace the check for TIF_ADDR32 with a check for the compat syscall. That
      solves both the 64-bit issuing a compat syscall and the 32-bit issuing a
      64-bit syscall problems.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-5-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3e6ef9c8
    • D
      x86/mm: Introduce mmap_compat_base() for 32-bit mmap() · 1b028f78
      Dmitry Safonov 提交于
      mmap() uses a base address, from which it starts to look for a free space
      for allocation.
      
      The base address is stored in mm->mmap_base, which is calculated during
      exec(). The address depends on task's size, set rlimit for stack, ASLR
      randomization. The base depends on the task size and the number of random
      bits which are different for 64-bit and 32bit applications.
      
      Due to the fact, that the base address is fixed, its mmap() from a compat
      (32bit) syscall issued by a 64bit task will return a address which is based
      on the 64bit base address and does not fit into the 32bit address space
      (4GB). The returned pointer is truncated to 32bit, which results in an
      invalid address.
      
      To solve store a seperate compat address base plus a compat legacy address
      base in mm_struct. These bases are calculated at exec() time and can be
      used later to address the 32bit compat mmap() issued by 64 bit
      applications.
      
      As a consequence of this change 32-bit applications issuing a 64-bit
      syscall (after doing a long jump) will get a 64-bit mapping now. Before
      this change 32-bit applications always got a 32bit mapping.
      
      [ tglx: Massaged changelog and added a comment ]
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-4-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1b028f78
    • D
      x86/mm: Add task_size parameter to mmap_base() · 8f3e474f
      Dmitry Safonov 提交于
      To correctly handle 32-bit and 64-bit mmap() syscalls in 64bit applications
      its required to have separate address bases to place a mapping.
      
      The tasksize can be used as an indicator to select the proper parameters
      for mmap_base().
      
      This requires the following changes:
      
       - Add task_size argument to mmap_base() and make the calculation based on it.
       - Provide mmap_legacy_base() as a seperate function
       - Use the new functions in arch_pick_mmap_layout()
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-3-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8f3e474f
    • D
      x86/mm: Introduce arch_rnd() to compute 32/64 mmap random base · 6a0b41d1
      Dmitry Safonov 提交于
      The compat (32bit) mmap() sycall issued by a 64-bit task results in a
      mapping above 4GB. That's outside the compat mode address space and
      prevents CRIU to restore 32bit processes from a 64bit application.
      
      As a first step to address this, split out the address base randomizing
      calculation from arch_mmap_rnd() into a helper function, which can be used
      independent of mmap_ia32() based decisions.
      
      [ tglx: Massaged changelog ]
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Cc: 0x7f454c46@gmail.com
      Cc: linux-mm@kvack.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170306141721.9188-2-dsafonov@virtuozzo.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a0b41d1
  4. 12 3月, 2017 2 次提交
    • D
      x86/tlb: Fix tlb flushing when lguest clears PGE · 2c4ea6e2
      Daniel Borkmann 提交于
      Fengguang reported random corruptions from various locations on x86-32
      after commits d2852a22 ("arch: add ARCH_HAS_SET_MEMORY config") and
      9d876e79 ("bpf: fix unlocking of jited image when module ronx not set")
      that uses the former. While x86-32 doesn't have a JIT like x86_64, the
      bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
      ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
      support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
      enabled.
      
      After investigating the crashes further, it turned out that using
      set_memory_ro() and set_memory_rw() didn't have the desired effect, for
      example, setting the pages as read-only on x86-32 would still let
      probe_kernel_write() succeed without error. This behavior would manifest
      itself in situations where the vmalloc'ed buffer was accessed prior to
      set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
      wasn't, the page attribute changes seemed to have taken effect, leading to
      the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
      that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
      "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
      a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
      though.
      
      There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
      on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
      (depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
      my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
      kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
      worked fine, and further investigating the cr4 one turned out that
      X86_CR4_PGE bit was not set in cr4 register, meaning the
      __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
      value instead of clearing X86_CR4_PGE in the first write to trigger the
      flush.
      
      It turned out that X86_CR4_PGE was cleared from cr4 during init from
      lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
      cleared from there due to concerns of using PGE in guest kernel that can
      lead to hard to trace bugs (see bff672e6 ("lguest: documentation V:
      Host") in init()). The CPU feature bits are cleared in dynamic
      boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
      static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
      flushing to use, meaning they still used the old setting of the host
      kernel.
      
      Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
      to static_cpu_has() checks is too late at this point as sections have been
      patched already, so for now, it seems reasonable to switch back to
      boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95
      ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
      cr3 as originally intended, properly makes the new page attributes visible
      and thus fixes the crashes seen by Fengguang.
      
      Fixes: c109bf95 ("x86/cpufeature: Remove cpu_has_pge")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: bp@suse.de
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: lkp@01.org
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
      Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2c4ea6e2
    • G
      score: Fix implicit includes now failing build after extable change · 0acf6119
      Guenter Roeck 提交于
      After changing from module.h to extable.h, score builds fail with:
      
        arch/score/kernel/traps.c: In function 'do_ri':
        arch/score/kernel/traps.c:248:4: error: implicit declaration of function 'user_disable_single_step'
        arch/score/mm/extable.c: In function 'fixup_exception':
        arch/score/mm/extable.c:32:38: error: dereferencing pointer to incomplete type
        arch/score/mm/extable.c:34:24: error: dereferencing pointer to incomplete type
      
      because extable.h doesn't drag in the same amount of headers as the
      module.h did.  Add in the headers which were implicitly expected.
      
      Fixes: 90858794 ("module.h: remove extable.h include now users have migrated")
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      [PG: tweak commit log; refresh for sched header refactoring.]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0acf6119
  5. 11 3月, 2017 1 次提交
    • T
      kexec, x86/purgatory: Unbreak it and clean it up · 40c50c1f
      Thomas Gleixner 提交于
      The purgatory code defines global variables which are referenced via a
      symbol lookup in the kexec code (core and arch).
      
      A recent commit addressing sparse warnings made these static and thereby
      broke kexec_file.
      
      Why did this happen? Simply because the whole machinery is undocumented and
      lacks any form of forward declarations. The variable names are unspecific
      and lack a prefix, so adding forward declarations creates shadow variables
      in the core code. Aside of that the code relies on magic constants and
      duplicate struct definitions with no way to ensure that these things stay
      in sync. The section placement of the purgatory variables happened by
      chance and not by design.
      
      Unbreak kexec and cleanup the mess:
      
       - Add proper forward declarations and document the usage
       - Use common struct definition
       - Use the proper common defines instead of magic constants
       - Add a purgatory_ prefix to have a proper name space
       - Use ARRAY_SIZE() instead of a homebrewn reimplementation
       - Add proper sections to the purgatory variables [ From Mike ]
      
      Fixes: 72042a8c ("x86/purgatory: Make functions and variables static")
      Reported-by: NMike Galbraith <&lt;efault@gmx.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Nicholas Mc Guire <der.herr@hofr.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Tobin C. Harding" <me@tobin.cc>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      40c50c1f
  6. 10 3月, 2017 10 次提交
  7. 09 3月, 2017 5 次提交
    • R
      KVM: nVMX: do not warn when MSR bitmap address is not backed · 05d8d346
      Radim Krčmář 提交于
      Before trying to do nested_get_page() in nested_vmx_merge_msr_bitmap(),
      we have already checked that the MSR bitmap address is valid (4k aligned
      and within physical limits).  SDM doesn't specify what happens if the
      there is no memory mapped at the valid address, but Intel CPUs treat the
      situation as if the bitmap was configured to trap all MSRs.
      
      KVM already does that by returning false and a correct handling doesn't
      need the guest-trigerrable warning that was reported by syzkaller:
      (The warning was originally there to catch some possible bugs in nVMX.)
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709
        nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline]
        WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709
        nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 7832 Comm: syz-executor1 Not tainted 4.10.0+ #229
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:15 [inline]
         dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
         panic+0x1fb/0x412 kernel/panic.c:179
         __warn+0x1c4/0x1e0 kernel/panic.c:540
         warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
         nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline]
         nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640
         enter_vmx_non_root_mode arch/x86/kvm/vmx.c:10471 [inline]
         nested_vmx_run+0x6186/0xaab0 arch/x86/kvm/vmx.c:10561
         handle_vmlaunch+0x1a/0x20 arch/x86/kvm/vmx.c:7312
         vmx_handle_exit+0xfc0/0x3f00 arch/x86/kvm/vmx.c:8526
         vcpu_enter_guest arch/x86/kvm/x86.c:6982 [inline]
         vcpu_run arch/x86/kvm/x86.c:7044 [inline]
         kvm_arch_vcpu_ioctl_run+0x1418/0x4840 arch/x86/kvm/x86.c:7205
         kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2570
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      [Jim Mattson explained the bare metal behavior: "I believe this behavior
       would be documented in the chipset data sheet rather than the SDM,
       since the chipset returns all 1s for an unclaimed read."]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      05d8d346
    • L
      KVM: arm64: Increase number of user memslots to 512 · 955a3fc6
      Linu Cherian 提交于
      Having only 32 memslots is a real constraint for the maximum
      number of PCI devices that can be assigned to a single guest.
      Assuming each PCI device/virtual function having two memory BAR
      regions, we could assign only 15 devices/virtual functions to a
      guest.
      
      Hence increase KVM_USER_MEM_SLOTS to 512 as done in other archs like
      powerpc.
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NLinu Cherian <linu.cherian@cavium.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      955a3fc6
    • L
      KVM: arm/arm64: Remove KVM_PRIVATE_MEM_SLOTS definition that are unused · 3e92f94a
      Linu Cherian 提交于
      arm/arm64 architecture doesnt use private memslots, hence removing
      KVM_PRIVATE_MEM_SLOTS macro definition.
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NLinu Cherian <linu.cherian@cavium.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      3e92f94a
    • L
      KVM: arm/arm64: Enable KVM_CAP_NR_MEMSLOTS on arm/arm64 · 7af4df85
      Linu Cherian 提交于
      Return KVM_USER_MEM_SLOTS for userspace capability query on
      NR_MEMSLOTS.
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NLinu Cherian <linu.cherian@cavium.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7af4df85
    • J
      axonram: Fix gendisk handling · 672a2c87
      Jan Kara 提交于
      It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
      handling in axon_ram_probe() to avoid doing that.
      
      Also del_gendisk() does not drop a reference to gendisk allocated by
      alloc_disk(). That has to be done by put_disk(). Add that call where
      needed.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      672a2c87
  8. 08 3月, 2017 1 次提交
  9. 07 3月, 2017 7 次提交
    • M
      arm64: KVM: Survive unknown traps from guests · ba4dd156
      Mark Rutland 提交于
      Currently we BUG() if we see an ESR_EL2.EC value we don't recognise. As
      configurable disables/enables are added to the architecture (controlled
      by RES1/RES0 bits respectively), with associated synchronous exceptions,
      it may be possible for a guest to trigger exceptions with classes that
      we don't recognise.
      
      While we can't service these exceptions in a manner useful to the guest,
      we can avoid bringing down the host. Per ARM DDI 0487A.k_iss10775, page
      D7-1937, EC values within the range 0x00 - 0x2c are reserved for future
      use with synchronous exceptions, and EC values within the range 0x2d -
      0x3f may be used for either synchronous or asynchronous exceptions.
      
      The patch makes KVM handle any unknown EC by injecting an UNDEFINED
      exception into the guest, with a corresponding (ratelimited) warning in
      the host dmesg. We could later improve on this with with a new (opt-in)
      exit to the host userspace.
      
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ba4dd156
    • M
      arm: KVM: Survive unknown traps from guests · f050fe7a
      Mark Rutland 提交于
      Currently we BUG() if we see a HSR.EC value we don't recognise. As
      configurable disables/enables are added to the architecture (controlled
      by RES1/RES0 bits respectively), with associated synchronous exceptions,
      it may be possible for a guest to trigger exceptions with classes that
      we don't recognise.
      
      While we can't service these exceptions in a manner useful to the guest,
      we can avoid bringing down the host. Per ARM DDI 0406C.c, all currently
      unallocated HSR EC encodings are reserved, and per ARM DDI
      0487A.k_iss10775, page G6-4395, EC values within the range 0x00 - 0x2c
      are reserved for future use with synchronous exceptions, and EC values
      within the range 0x2d - 0x3f may be used for either synchronous or
      asynchronous exceptions.
      
      The patch makes KVM handle any unknown EC by injecting an UNDEFINED
      exception into the guest, with a corresponding (ratelimited) warning in
      the host dmesg. We could later improve on this with with a new (opt-in)
      exit to the host userspace.
      
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      f050fe7a
    • W
      KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset · 2f707d97
      Wanpeng Li 提交于
      Reported by syzkaller:
      
          WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029
          nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
          CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:15 [inline]
           dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
           panic+0x1fb/0x412 kernel/panic.c:179
           __warn+0x1c4/0x1e0 kernel/panic.c:540
           warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
           nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
           vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline]
           vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324
           kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099
           do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128
           __msr_io arch/x86/kvm/x86.c:2577 [inline]
           msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614
           kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497
           kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721
           vfs_ioctl fs/ioctl.c:43 [inline]
           do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
           SYSC_ioctl fs/ioctl.c:698 [inline]
           SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
           entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      The syzkaller folks reported a nested_run_pending warning during userspace
      clear VMX capability which is exposed to L1 before.
      
      The warning gets thrown while doing
      
      (*(uint32_t*)0x20aecfe8 = (uint32_t)0x1);
      (*(uint32_t*)0x20aecfec = (uint32_t)0x0);
      (*(uint32_t*)0x20aecff0 = (uint32_t)0x3a);
      (*(uint32_t*)0x20aecff4 = (uint32_t)0x0);
      (*(uint64_t*)0x20aecff8 = (uint64_t)0x0);
      r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul,
      		0x20aecfe8ul, 0, 0, 0, 0, 0, 0);
      
      i.e. KVM_SET_MSR ioctl with
      
      struct kvm_msrs {
      	.nmsrs = 1,
      		.pad = 0,
      		.entries = {
      			{.index = MSR_IA32_FEATURE_CONTROL,
      			 .reserved = 0,
      			 .data = 0}
      		}
      }
      
      The VMLANCH/VMRESUME emulation should be stopped since the CPU is going to
      reset here. This patch resets the nested_run_pending since the CPU is going
      to be reset hence there should be nothing pending.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      2f707d97
    • S
      irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065 · 90922a2d
      Shanker Donthineni 提交于
      On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware
      implementation uses 16Bytes for Interrupt Translation Entry (ITE),
      but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size.
      
      It might cause kernel memory corruption depending on the number
      of MSI(x) that are configured and the amount of memory that has
      been allocated for ITEs in its_create_device().
      
      This patch fixes the potential memory corruption by setting the
      correct ITE size to 16Bytes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NShanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      90922a2d
    • G
      h8300: Fix build breakage caused by header file changes · 80aa1a54
      Guenter Roeck 提交于
      Fix the following h8300 build failures:
      
        arch/h8300/kernel/ptrace_h.c: In function ‘trace_trap’:
        arch/h8300/kernel/ptrace_h.c:253:3: error: implicit declaration of function ‘force_sig’
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Fixes: c3edc401 ("sched/headers: Move task_struct::signal and ...")
      Link: http://lkml.kernel.org/r/1488738434-3504-1-git-send-email-linux@roeck-us.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80aa1a54
    • G
      avr32: Fix build error caused by include file reshuffling · 1fbdbcea
      Guenter Roeck 提交于
      Various avr32 builds fail:
      
        arch/avr32/oprofile/backtrace.c:58: error: dereferencing pointer to incomplete type
        arch/avr32/oprofile/backtrace.c:60: error: implicit declaration of function 'user_mode'
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: oprofile-list@lists.sf.net
      Fixes: f780d89a ("sched/headers: Remove <asm/ptrace.h> from ...")
      Link: http://lkml.kernel.org/r/1488762357-4500-1-git-send-email-linux@roeck-us.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1fbdbcea
    • J
      kvm: nVMX: VMCLEAR should not cause the vCPU to shut down · 587d7e72
      Jim Mattson 提交于
      VMCLEAR should silently ignore a failure to clear the launch state of
      the VMCS referenced by the operand.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [Changed "kvm_write_guest(vcpu->kvm" to "kvm_vcpu_write_guest(vcpu".]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      587d7e72
  10. 06 3月, 2017 1 次提交
    • M
      powerpc: Sort the selects under CONFIG_PPC · a7d2475a
      Michael Ellerman 提交于
      We have a big list of selects under CONFIG_PPC, and currently they're
      completely unsorted. This means people tend to add new selects at the
      bottom of the list, and so two commits which both add a new select will
      often conflict.
      
      Instead sort it alphabetically. This is nicer in and of itself, but also
      means two commits that add a new select will have a greater chance of
      not conflicting.
      
      Add a note at the top and bottom asking people to keep it sorted.
      
      And while we're here pad out the 'if' expressions to make them stand
      out.
      Suggested-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a7d2475a