1. 28 4月, 2016 10 次提交
  2. 22 4月, 2016 2 次提交
    • J
      x86/mm/xen: Suppress hugetlbfs in PV guests · 103f6112
      Jan Beulich 提交于
      Huge pages are not normally available to PV guests. Not suppressing
      hugetlbfs use results in an endless loop of page faults when user mode
      code tries to access a hugetlbfs mapped area (since the hypervisor
      denies such PTEs to be created, but error indications can't be
      propagated out of xen_set_pte_at(), just like for various of its
      siblings), and - once killed in an oops like this:
      
        kernel BUG at .../fs/hugetlbfs/inode.c:428!
        invalid opcode: 0000 [#1] SMP
        ...
        RIP: e030:[<ffffffff811c333b>]  [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
        ...
        Call Trace:
         [<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
         [<ffffffff81167b3d>] evict+0xbd/0x1b0
         [<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
         [<ffffffff81165b0e>] dput+0x1fe/0x220
         [<ffffffff81150535>] __fput+0x155/0x200
         [<ffffffff81079fc0>] task_work_run+0x60/0xa0
         [<ffffffff81063510>] do_exit+0x160/0x400
         [<ffffffff810637eb>] do_group_exit+0x3b/0xa0
         [<ffffffff8106e8bd>] get_signal+0x1ed/0x470
         [<ffffffff8100f854>] do_signal+0x14/0x110
         [<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
         [<ffffffff814178a5>] retint_user+0x8/0x13
      
      This is CVE-2016-3961 / XSA-174.
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <JGross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org
      Cc: xen-devel <xen-devel@lists.xenproject.org>
      Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      103f6112
    • D
      arm64: Fix EL1/EL2 early init inconsistencies with VHE · 882416c1
      Dave Martin 提交于
      When using the Virtualisation Host Extensions, EL1 is not used in
      the host and requires no separate configuration.
      
      In addition, with VHE enabled, non-hyp-specific EL2 configuration
      that does not need to be done early will be done anyway in
      __cpu_setup via the _EL1 system register aliases.  In particular,
      the layout and definition of CPTR_EL2 are changed by enabling VHE
      so that they resemble CPACR_EL1, so existing code to initialise
      CPTR_EL2 becomes architecturally wrong in this case.
      
      This patch simply skips the affected initialisation code in the
      non-VHE case.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      882416c1
  3. 20 4月, 2016 3 次提交
  4. 18 4月, 2016 4 次提交
    • A
      arm64: fix invalidation of wrong __early_cpu_boot_status cacheline · adb49070
      Ard Biesheuvel 提交于
      In head.S, the str_l macro, which takes a source register, a symbol name
      and a temp register, is used to store a status value to the variable
      __early_cpu_boot_status. Subsequently, the value of the temp register is
      reused to invalidate any cachelines covering this variable.
      
      However, since str_l resolves to
      
            adrp    \tmp, \sym
            str     \src, [\tmp, :lo12:\sym]
      
      the temp register never actually holds the address of the variable but
      only of the 4 KB window that covers it, and reusing it leads to the
      wrong cacheline being invalidated. So instead, take the address
      explicitly before doing the store, and reuse that value to perform
      the cache invalidation.
      
      Fixes: bb905274 ("arm64: Handle early CPU boot failures")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NSuzuki K Poulose <Suzuki.Poulose@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      adb49070
    • A
      powerpc: Update TM user feature bits in scan_features() · 4705e024
      Anton Blanchard 提交于
      We need to update the user TM feature bits (PPC_FEATURE2_HTM and
      PPC_FEATURE2_HTM) to mirror what we do with the kernel TM feature
      bit.
      
      At the moment, if firmware reports TM is not available we turn off
      the kernel TM feature bit but leave the userspace ones on. Userspace
      thinks it can execute TM instructions and it dies trying.
      
      This (together with a QEMU patch) fixes PR KVM, which doesn't currently
      support TM.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4705e024
    • A
      powerpc: Update cpu_user_features2 in scan_features() · beff8237
      Anton Blanchard 提交于
      scan_features() updates cpu_user_features but not cpu_user_features2.
      
      Amongst other things, cpu_user_features2 contains the user TM feature
      bits which we must keep in sync with the kernel TM feature bit.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      beff8237
    • A
      powerpc: scan_features() updates incorrect bits for REAL_LE · 6997e57d
      Anton Blanchard 提交于
      The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
      feature value, meaning all the remaining elements initialise the wrong
      values.
      
      This means instead of checking for byte 5, bit 0, we check for byte 0,
      bit 0, and then we incorrectly set the CPU feature bit as well as MMU
      feature bit 1 and CPU user feature bits 0 and 2 (5).
      
      Checking byte 0 bit 0 (IBM numbering), means we're looking at the
      "Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
      In practice that bit is set on all platforms which have the property.
      
      This means we set CPU_FTR_REAL_LE always. In practice that seems not to
      matter because all the modern cpus which have this property also
      implement REAL_LE, and we've never needed to disable it.
      
      We're also incorrectly setting MMU feature bit 1, which is:
      
        #define MMU_FTR_TYPE_8xx		0x00000002
      
      Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
      code, which can't run on the same cpus as scan_features(). So this also
      doesn't matter in practice.
      
      Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
      is not currently used, and bit 0 is:
      
        #define PPC_FEATURE_PPC_LE		0x00000001
      
      Which says the CPU supports the old style "PPC Little Endian" mode.
      Again this should be harmless in practice as no 64-bit CPUs implement
      that mode.
      
      Fix the code by adding the missing initialisation of the MMU feature.
      
      Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
      would be unsafe to start using it as old kernels incorrectly set it.
      
      Fixes: 44ae3ab3 ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      [mpe: Flesh out changelog, add comment reserving 0x4]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6997e57d
  5. 16 4月, 2016 3 次提交
    • V
      x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances · 1e2ae9ec
      Vitaly Kuznetsov 提交于
      Generation2 instances don't support reporting the NMI status on port 0x61,
      read from there returns 'ff' and we end up reporting nonsensical PCI
      error (as there is no PCI bus in these instances) on all NMIs:
      
          NMI: PCI system error (SERR) for reason ff on CPU 0.
          Dazed and confused, but trying to continue
      
      Fix the issue by overriding x86_platform.get_nmi_reason. Use 'booted on
      EFI' flag to detect Gen2 instances.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/1460728232-31433-1-git-send-email-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e2ae9ec
    • H
      s390: add CPU_BIG_ENDIAN config option · 2fd92273
      Heiko Carstens 提交于
      Make sure that s390 appears to be a big endian machine by defining
      this config option.
      
      Without this s390 appears to be little endian as seen by e.g. the
      recordmount script: "perl ./scripts/recordmcount.pl "s390" "little"
      "64""
      This has no practical impact within the script since the endian
      variable is only evaluated for mips. However there are already a
      couple of common code places which evaluate this config option. None
      of them is relevant for s390 currently though.
      
      To avoid any issues in the future (and fix the recordmcount oddity)
      add the new config option.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2fd92273
    • H
      s390/spinlock: avoid yield to non existent cpu · 84976952
      Heiko Carstens 提交于
      arch_spin_lock_wait_flags() checks if a spinlock is not held before
      trying a compare and swap instruction. If the lock is unlocked it
      tries the compare and swap instruction, however if a different cpu
      grabbed the lock in the meantime the instruction will fail as
      expected.
      
      Subsequently the arch_spin_lock_wait_flags() incorrectly tries to
      figure out if the cpu that holds the lock is running. However it is
      using the wrong cpu number for this (-1) and then will also yield the
      current cpu to the wrong cpu.
      
      Fix this by adding a missing continue statement.
      
      Fixes: 470ada6b ("s390/spinlock: refactor arch_spin_lock_wait[_flags]")
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      84976952
  6. 15 4月, 2016 3 次提交
  7. 14 4月, 2016 1 次提交
  8. 13 4月, 2016 3 次提交
  9. 11 4月, 2016 4 次提交
    • G
      m68k/gpio: remove arch specific sysfs bus device · 2763ee64
      Greg Ungerer 提交于
      The ColdFire architecture specific gpio support code registers a sysfs
      bus device named "gpio". This clashes with the new generic API device
      added in commit 3c702e99 ("gpio: add a userspace chardev ABI for GPIOs").
      
      The old ColdFire sysfs gpio device was never used for anything specific,
      and no links or other nodes were created under it. The new API sysfs gpio
      device has all the same default sysfs links (device, drivers, etc) and
      they are properly populated.
      
      Remove the old ColdFire sysfs gpio registration.
      Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      2763ee64
    • P
      KVM: x86: mask CPUID(0xD,0x1).EAX against host value · 316314ca
      Paolo Bonzini 提交于
      This ensures that the guest doesn't see XSAVE extensions
      (e.g. xgetbv1 or xsavec) that the host lacks.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      316314ca
    • D
      kvm: x86: do not leak guest xcr0 into host interrupt handlers · fc5b7f3b
      David Matlack 提交于
      An interrupt handler that uses the fpu can kill a KVM VM, if it runs
      under the following conditions:
       - the guest's xcr0 register is loaded on the cpu
       - the guest's fpu context is not loaded
       - the host is using eagerfpu
      
      Note that the guest's xcr0 register and fpu context are not loaded as
      part of the atomic world switch into "guest mode". They are loaded by
      KVM while the cpu is still in "host mode".
      
      Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
      interrupt handler will look something like this:
      
      if (irq_fpu_usable()) {
              kernel_fpu_begin();
      
              [... code that uses the fpu ...]
      
              kernel_fpu_end();
      }
      
      As long as the guest's fpu is not loaded and the host is using eager
      fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
      returns true). The interrupt handler proceeds to use the fpu with
      the guest's xcr0 live.
      
      kernel_fpu_begin() saves the current fpu context. If this uses
      XSAVE[OPT], it may leave the xsave area in an undesirable state.
      According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
      if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
      xcr0[i] == 0 following an XSAVE.
      
      kernel_fpu_end() restores the fpu context. Now if any bit i in
      XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
      fault is trapped and SIGSEGV is delivered to the current process.
      
      Only pre-4.2 kernels appear to be vulnerable to this sequence of
      events. Commit 653f52c3 ("kvm,x86: load guest FPU context more eagerly")
      from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
      
      This patch fixes the bug by keeping the host's xcr0 loaded outside
      of the interrupts-disabled region where KVM switches into guest mode.
      
      Cc: stable@vger.kernel.org
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      [Move load after goto cancel_injection. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc5b7f3b
    • X
      KVM: MMU: fix permission_fault() · 7a98205d
      Xiao Guangrong 提交于
      kvm-unit-tests complained about the PFEC is not set properly, e.g,:
      test pte.rw pte.d pte.nx pde.p pde.rw pde.pse user fetch: FAIL: error code 15
      expected 5
      Dump mapping: address: 0x123400000000
      ------L4: 3e95007
      ------L3: 3e96007
      ------L2: 2000083
      
      It's caused by the reason that PFEC returned to guest is copied from the
      PFEC triggered by shadow page table
      
      This patch fixes it and makes the logic of updating errcode more clean
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      [Do not assume pfec.p=1. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a98205d
  10. 09 4月, 2016 5 次提交
  11. 08 4月, 2016 2 次提交