1. 21 10月, 2015 2 次提交
    • P
      powerpc/powernv: Handle irq_happened flag correctly in off-line loop · 53c656c4
      Paul Mackerras 提交于
      This fixes a bug where it is possible for an off-line CPU to fail to go
      into a low-power state (nap/sleep/winkle), and to become unresponsive to
      requests from the KVM subsystem to wake up and run a VCPU. What can
      happen is that a maskable interrupt of some kind (external, decrementer,
      hypervisor doorbell, or HMI) after we have called local_irq_disable() at
      the beginning of pnv_smp_cpu_kill_self() and before interrupts are
      hard-disabled inside power7_nap/sleep/winkle(). In this situation, the
      pending event is marked in the irq_happened flag in the PACA. This
      pending event prevents power7_nap/sleep/winkle from going to the
      requested low-power state; instead they return immediately. We don't
      deal with any of these pending event flags in the off-line loop in
      pnv_smp_cpu_kill_self() because power7_nap et al. return 0 in this case,
      so we will have srr1 == 0, and none of the processing to clear
      interrupts or doorbells will be done.
      
      Usually, the most obvious symptom of this is that a KVM guest will fail
      with a console message saying "KVM: couldn't grab cpu N".
      
      This fixes the problem by making sure we handle the irq_happened flags
      properly. First, we hard-disable before the off-line loop. Once we have
      hard-disabled, the irq_happened flags can't change underneath us. We
      unconditionally clear the DEC and HMI flags: there is no processing of
      timer interrupts while off-line, and the necessary HMI processing is all
      done in lower-level code. We leave the EE and DBELL flags alone for the
      first iteration of the loop, so that we won't fail to respond to a
      split-core request that came in just before hard-disabling. Within the
      loop, we handle external interrupts if the EE bit is set in irq_happened
      as well as if the low-power state was interrupted by an external
      interrupt. (We don't need to do the msgclr for a pending doorbell in
      irq_happened, because doorbells are edge-triggered and don't remain
      pending in hardware.) Then we clear both the EE and DBELL flags, and
      once clear, they cannot be set again (until this CPU comes online again,
      that is).
      
      This also fixes the debug check to not be done when we just ran a KVM
      guest or when the sleep didn't happen because of a pending event in
      irq_happened.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      53c656c4
    • P
      powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" · 23316316
      Paul Mackerras 提交于
      This reverts commit 9678cdaa ("Use the POWER8 Micro Partition
      Prefetch Engine in KVM HV on POWER8") because the original commit had
      multiple, partly self-cancelling bugs, that could cause occasional
      memory corruption.
      
      In fact the logmpp instruction was incorrectly using register r0 as the
      source of the buffer address and operation code, and depending on what
      was in r0, it would either do nothing or corrupt the 64k page pointed to
      by r0.
      
      The logmpp instruction encoding and the operation code definitions could
      be corrected, but then there is the problem that there is no clearly
      defined way to know when the hardware has finished writing to the
      buffer.
      
      The original commit attempted to work around this by aborting the
      write-out before starting the prefetch, but this is ineffective in the
      case where the virtual core is now executing on a different physical
      core from the one where the write-out was initiated.
      
      These problems plus advice from the hardware designers not to use the
      function (since the measured performance improvement from using the
      feature was actually mostly negative), mean that reverting the code is
      the best option.
      
      Fixes: 9678cdaa ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8")
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      23316316
  2. 09 10月, 2015 2 次提交
    • D
      powerpc/powernv: Panic on unhandled Machine Check · f2dd80ec
      Daniel Axtens 提交于
      All unrecovered machine check errors on PowerNV should cause an
      immediate panic. There are 2 reasons that this is the right policy:
      it's not safe to continue, and we're already trying to reboot.
      
      Firstly, if we go through the recovery process and do not successfully
      recover, we can't be sure about the state of the machine, and it is
      not safe to recover and proceed.
      
      Linux knows about the following sources of Machine Check Errors:
      - Uncorrectable Errors (UE)
      - Effective - Real Address Translation (ERAT)
      - Segment Lookaside Buffer (SLB)
      - Translation Lookaside Buffer (TLB)
      - Unknown/Unrecognised
      
      In the SLB, TLB and ERAT cases, we can further categorise these as
      parity errors, multihit errors or unknown/unrecognised.
      
      We can handle SLB errors by flushing and reloading the SLB. We can
      handle TLB and ERAT multihit errors by flushing the TLB. (It appears
      we may not handle TLB and ERAT parity errors: I will investigate
      further and send a followup patch if appropriate.)
      
      This leaves us with uncorrectable errors. Uncorrectable errors are
      usually the result of ECC memory detecting an error that it cannot
      correct, but they also crop up in the context of PCI cards failing
      during DMA writes, and during CAPI error events.
      
      There are several types of UE, and there are 3 places a UE can occur:
      Skiboot, the kernel, and userspace. For Skiboot errors, we have the
      facility to make some recoverable. For userspace, we can simply kill
      (SIGBUS) the affected process. We have no meaningful way to deal with
      UEs in kernel space or in unrecoverable sections of Skiboot.
      
      Currently, these unrecovered UEs fall through to
      machine_check_expection() in traps.c, which calls die(), which OOPSes
      and sends SIGBUS to the process. This sometimes allows us to stumble
      onwards. For example we've seen UEs kill the kernel eehd and
      khugepaged. However, the process killed could have held a lock, or it
      could have been a more important process, etc: we can no longer make
      any assertions about the state of the machine. Similarly if we see a
      UE in skiboot (and again we've seen this happen), we're not in a
      position where we can make any assertions about the state of the
      machine.
      
      Likewise, for unknown or unrecognised errors, we're not able to say
      anything about the state of the machine.
      
      Therefore, if we have an unrecovered MCE, the most appropriate thing
      to do is to panic.
      
      The second reason is that since e784b649 ("powerpc/powernv: Invoke
      opal_cec_reboot2() on unrecoverable machine check errors."), we
      attempt a special OPAL reboot on an unhandled MCE. This is so the
      hardware can record error data for later debugging.
      
      The comments in that commit assert that we are heading down the panic
      path anyway. At the moment this is not always true. With UEs in kernel
      space, for instance, they are marked as recoverable by the hardware,
      so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't
      panic() but would simply die() and OOPS. It doesn't make sense to be
      staggering on if we've just tried to reboot: we should panic().
      
      Explicitly panic() on unrecovered MCEs on PowerNV.
      Update the comments appropriately.
      
      This fixes some hangs following EEH events on cxlflash setups.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NIan Munsie <imunsie@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f2dd80ec
    • C
      powerpc: Fix checkstop in native_hpte_clear() with lockdep · fdf880a6
      Cyril Bur 提交于
      native_hpte_clear() is called in real mode from two places:
      - Early in boot during htab initialisation if firmware assisted dump is
        active.
      - Late in the kexec path.
      
      In both contexts there is no need to disable interrupts are they are
      already disabled. Furthermore, locking around the tlbie() is only required
      for pre POWER5 hardware.
      
      On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre
      POWER5 hardware concurrent tlbie()s could result in deadlock. This code
      would only be executed at crashdump time, during which all bets are off,
      concurrent tlbie()s are unlikely and taking locks is unsafe therefore the
      best course of action is to simply do nothing. Concurrent tlbie()s are not
      possible in the first case as secondary CPUs have not come up yet.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fdf880a6
  3. 30 9月, 2015 1 次提交
  4. 29 9月, 2015 1 次提交
    • M
      powerpc/configs: Re-enable CONFIG_SCSI_DH · 6b98f2be
      Michael Ellerman 提交于
      Commit 086b91d0 ("scsi_dh: integrate into the core SCSI code")
      changed CONFIG_SCSI_DH from tristate to bool.
      
      Our defconfigs have CONFIG_SCSI_DH=m, which the kconfig machinery warns
      us is invalid, but instead of converting it to =y it leaves it unset.
      This means we loose the CONFIG_SCSI_DH code and everything that depends
      on it.
      
      So convert the values in the defconfigs to =y.
      
      Fixes: 086b91d0 ("scsi_dh: integrate into the core SCSI code")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b98f2be
  5. 25 9月, 2015 7 次提交
    • D
      KVM: disable halt_poll_ns as default for s390x · 920552b2
      David Hildenbrand 提交于
      We observed some performance degradation on s390x with dynamic
      halt polling. Until we can provide a proper fix, let's enable
      halt_poll_ns as default only for supported architectures.
      
      Architectures are now free to set their own halt_poll_ns
      default value.
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      920552b2
    • P
      KVM: x86: fix off-by-one in reserved bits check · 58c95070
      Paolo Bonzini 提交于
      29ecd660 ("KVM: x86: avoid uninitialized variable warning",
      2015-09-06) introduced a not-so-subtle problem, which probably
      escaped review because it was not part of the patch context.
      
      Before the patch, leaf was always equal to iterator.level.  After,
      it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set,
      and when is_shadow_zero_bits_set does another "-1" the check on
      reserved bits becomes incorrect.  Using "iterator.level" in the call
      fixes this call trace:
      
      WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]()
      Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq
      [...]
      Call Trace:
        dump_stack+0x4e/0x84
        warn_slowpath_common+0x95/0xe0
        warn_slowpath_null+0x1a/0x20
        handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]
        tdp_page_fault+0x231/0x290 [kvm]
        ? emulator_pio_in_out+0x6e/0xf0 [kvm]
        kvm_mmu_page_fault+0x36/0x240 [kvm]
        ? svm_set_cr0+0x95/0xc0 [kvm_amd]
        pf_interception+0xde/0x1d0 [kvm_amd]
        handle_exit+0x181/0xa70 [kvm_amd]
        ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
        kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm]
        ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
        ? preempt_count_sub+0x9b/0xf0
        ? mutex_lock_killable_nested+0x26f/0x490
        ? preempt_count_sub+0x9b/0xf0
        kvm_vcpu_ioctl+0x358/0x710 [kvm]
        ? __fget+0x5/0x210
        ? __fget+0x101/0x210
        do_vfs_ioctl+0x2f4/0x560
        ? __fget_light+0x29/0x90
        SyS_ioctl+0x4c/0x90
        entry_SYSCALL_64_fastpath+0x16/0x73
      ---[ end trace 37901c8686d84de6 ]---
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      58c95070
    • P
      KVM: x86: use correct page table format to check nested page table reserved bits · 6fec2144
      Paolo Bonzini 提交于
      Intel CPUID on AMD host or vice versa is a weird case, but it can
      happen.  Handle it by checking the host CPU vendor instead of the
      guest's in reset_tdp_shadow_zero_bits_mask.  For speed, the
      check uses the fact that Intel EPT has an X (executable) bit while
      AMD NPT has NX.
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fec2144
    • P
      KVM: svm: do not call kvm_set_cr0 from init_vmcb · 79a8059d
      Paolo Bonzini 提交于
      kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
      memslots array (SRCU protected).  Using a mini SRCU critical section
      is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
      the VMX vcpu_create callback calls synchronize_srcu.
      
      Fixes this lockdep splat:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.3.0-rc1+ #1 Not tainted
      -------------------------------
      include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-i38/17000:
       #0:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]
      
      [...]
      Call Trace:
       dump_stack+0x4e/0x84
       lockdep_rcu_suspicious+0xfd/0x130
       kvm_zap_gfn_range+0x188/0x1a0 [kvm]
       kvm_set_cr0+0xde/0x1e0 [kvm]
       init_vmcb+0x760/0xad0 [kvm_amd]
       svm_create_vcpu+0x197/0x250 [kvm_amd]
       kvm_arch_vcpu_create+0x47/0x70 [kvm]
       kvm_vm_ioctl+0x302/0x7e0 [kvm]
       ? __lock_is_held+0x51/0x70
       ? __fget+0x101/0x210
       do_vfs_ioctl+0x2f4/0x560
       ? __fget_light+0x29/0x90
       SyS_ioctl+0x4c/0x90
       entry_SYSCALL_64_fastpath+0x16/0x73
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      79a8059d
    • B
      ARM: sti: dt: adapt DT to fix probe/bind issues in DRM driver · 79a313f5
      Benjamin Gaignard 提交于
      STI drm drivers probe and bind using component framework was incorrect.
      In addition to drivers fix DT update is needed to make all sub-components
      become childs of sti-display-subsystem.
      Signed-off-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org>
      Signed-off-by: NMaxime Coquelin <maxime.coquelin@st.com>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      79a313f5
    • K
      ARM: dts: fix omap2+ address translation for pbias · 9a5e3f27
      Kishon Vijay Abraham I 提交于
      "ARM: dts: <omap2/omap4/omap5/dra7>: add minimal l4 bus
      layout with control module support" moved pbias_regulator dt node
      from being a child node of ocp to be the child node of
      'syscon'. Since 'syscon' doesn't have the 'ranges' property,
      address translation fails while trying to convert the address
      to resource. Fix it here by populating 'ranges' property in
      syscon dt node.
      
      Fixes: 72b10ac0 ("ARM: dts: omap24xx: add minimal l4 bus
      layout with control module support")
      
      Fixes: 7415b0b4 ("ARM: dts: omap4: add minimal l4 bus layout
      with control module support")
      
      Fixes: ed8509ed ("ARM: dts: omap5: add minimal l4 bus
      layout with control module support")
      
      Fixes: d919501f ("ARM: dts: dra7: add minimal l4 bus
      layout with control module support")
      Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com>
      [tony@atomide.com: fixed omap3 pbias to work]
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      9a5e3f27
    • L
      Documentation: arm: Fix typo in the idle-states bindings examples · a13f18f5
      Lorenzo Pieralisi 提交于
      The idle-states bindings mandate that the entry-method string
      in the idle-states node must be "psci" for ARM v8 64-bit systems,
      but the examples in the bindings report a wrong entry-method string.
      Owing to this typo, some dts in the kernel wrongly defined the
      entry-method property, since they likely cut and pasted the example
      definition without paying attention to the bindings definitions.
      
      This patch fixes the typo in the DT idle states bindings examples and
      respective dts in the kernel so that the bindings and related dts
      files are made compliant.
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Howard Chen <howard.chen@linaro.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Heiko Stuebner <heiko@sntech.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      a13f18f5
  6. 24 9月, 2015 1 次提交
    • R
      ARM: alignment: fix alignment handling for uaccess changes · 274e91b8
      Russell King 提交于
      Jonathan Liu reports that the recent addition of CPU_SW_DOMAIN_PAN
      causes wpa_supplicant to die due to the following kernel oops:
      
      Unhandled fault: page domain fault (0x81b) at 0x001017a2
      pgd = ee1b8000
      [001017a2] *pgd=6ebee831, *pte=6c35475f, *ppte=6c354c7f
      Internal error: : 81b [#1] SMP ARM
      Modules linked in: rt2800usb rt2x00usb rt2800librt2x00lib crc_ccitt mac80211
      CPU: 1 PID: 202 Comm: wpa_supplicant Not tainted 4.3.0-rc2 #1
      Hardware name: Allwinner sun7i (A20) Family
      task: ec872f80 ti: ee364000 task.ti: ee364000
      PC is at do_alignment_ldmstm+0x1d4/0x238
      LR is at 0x0
      pc : [<c001d1d8>]    lr : [<00000000>]    psr: 600c0113
      sp : ee365e18  ip : 00000000  fp : 00000002
      r10: 001017a2  r9 : 00000002  r8 : 001017aa
      r7 : ee365fb0  r6 : e8820018  r5 : 001017a2  r4 : 00000003
      r3 : d49e30e0  r2 : 00000000  r1 : ee365fbc  r0 : 00000000
      Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none[   34.393106] Control: 10c5387d  Table: 6e1b806a  DAC: 00000051
      Process wpa_supplicant (pid: 202, stack limit = 0xee364210)
      Stack: (0xee365e18 to 0xee366000)
      ...
      [<c001d1d8>] (do_alignment_ldmstm) from [<c001d510>] (do_alignment+0x1f0/0x904)
      [<c001d510>] (do_alignment) from [<c00092a0>] (do_DataAbort+0x38/0xb4)
      [<c00092a0>] (do_DataAbort) from [<c0013d7c>] (__dabt_usr+0x3c/0x40)
      Exception stack(0xee365fb0 to 0xee365ff8)
      5fa0:                                     00000000 56c728c0 001017a2 d49e30e0
      5fc0: 775448d2 597d4e74 00200800 7a9e1625 00802001 00000021 b6deec84 00000100
      5fe0: 08020200 be9f4f20 0c0b0d0a b6d9b3e0 600c0010 ffffffff
      Code: e1a0a005 e1a0000c 1affffe8 e5913000 (e4ea3001)
      ---[ end trace 0acd3882fcfdf9dd ]---
      
      This is caused by the alignment handler not being fixed up for the
      uaccess changes, and userspace issuing an unaligned LDM instruction.
      So, fix the problem by adding the necessary fixups.
      Reported-by: NJonathan Liu <net147@gmail.com>
      Tested-by: NJonathan Liu <net147@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      274e91b8
  7. 23 9月, 2015 3 次提交
    • A
      x86, efi, kasan: #undef memset/memcpy/memmove per arch · 769a8089
      Andrey Ryabinin 提交于
      In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove
      with not-instrumented analogues __memset/__memcpy/__memove.
      
      However, on x86 the EFI stub is not linked with the kernel.  It uses
      not-instrumented mem*() functions from arch/x86/boot/compressed/string.c
      
      So we don't replace them with __mem*() variants in EFI stub.
      
      On ARM64 the EFI stub is linked with the kernel, so we should replace
      mem*() functions with __mem*(), because the EFI stub runs before KASAN
      sets up early shadow.
      
      So let's move these #undef mem* into arch's asm/efi.h which is also
      included by the EFI stub.
      
      Also, this will fix the warning in 32-bit build reported by kbuild test
      robot:
      
      	efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy'
      
      [akpm@linux-foundation.org: use 80 cols in comment]
      Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Reported-by: NFengguang Wu <fengguang.wu@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769a8089
    • A
      x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code · 83c133cf
      Andy Lutomirski 提交于
      The NMI entry code that switches to the normal kernel stack needs to
      be very careful not to clobber any extra stack slots on the NMI
      stack.  The code is fine under the assumption that SWAPGS is just a
      normal instruction, but that assumption isn't really true.  Use
      SWAPGS_UNSAFE_STACK instead.
      
      This is part of a fix for some random crashes that Sasha saw.
      
      Fixes: 9b6e6a83 ("x86/nmi/64: Switch stacks on userspace NMI entry")
      Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      83c133cf
    • A
      x86/paravirt: Replace the paravirt nop with a bona fide empty function · fc57a7c6
      Andy Lutomirski 提交于
      PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
      example, trimmed for readability):
      
          ff 15 00 00 00 00       callq  *0x0(%rip)        # 2796 <nmi+0x6>
                    2792: R_X86_64_PC32     pv_irq_ops+0x2c
      
      That's a call through a function pointer to regular C function that
      does nothing on native boots, but that function isn't protected
      against kprobes, isn't marked notrace, and is certainly not
      guaranteed to preserve any registers if the compiler is feeling
      perverse.  This is bad news for a CLBR_NONE operation.
      
      Of course, if everything works correctly, once paravirt ops are
      patched, it gets nopped out, but what if we hit this code before
      paravirt ops are patched in?  This can potentially cause breakage
      that is very difficult to debug.
      
      A more subtle failure is possible here, too: if _paravirt_nop uses
      the stack at all (even just to push RBP), it will overwrite the "NMI
      executing" variable if it's called in the NMI prologue.
      
      The Xen case, perhaps surprisingly, is fine, because it's already
      written in asm.
      
      Fix all of the cases that default to paravirt_nop (including
      adjust_exception_frame) with a big hammer: replace paravirt_nop with
      an asm function that is just a ret instruction.
      
      The Xen case may have other problems, so document them.
      
      This is part of a fix for some random crashes that Sasha saw.
      Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fc57a7c6
  8. 22 9月, 2015 1 次提交
  9. 21 9月, 2015 5 次提交
    • M
      powerpc: Wire up sys_membarrier() · 793b8bf9
      Michael Ellerman 提交于
      The selftest passes on 64-bit LE & BE, and 32-bit.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      793b8bf9
    • P
      KVM: x86: trap AMD MSRs for the TSeg base and mask · 3afb1121
      Paolo Bonzini 提交于
      These have roughly the same purpose as the SMRR, which we do not need
      to implement in KVM.  However, Linux accesses MSR_K8_TSEG_ADDR at
      boot, which causes problems when running a Xen dom0 under KVM.
      Just return 0, meaning that processor protection of SMRAM is not
      in effect.
      Reported-by: NM A Young <m.a.young@durham.ac.uk>
      Cc: stable@vger.kernel.org
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3afb1121
    • T
      KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() · 3eb4ee68
      Thomas Huth 提交于
      Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
      functions) has to be protected via the kvm->srcu lock.
      The kvmppc_h_logical_ci_load() and -store() functions are missing
      this lock so far, so let's add it there, too.
      This fixes the problem that the kernel reports "suspicious RCU usage"
      when lock debugging is enabled.
      
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: 99342cf8Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3eb4ee68
    • G
      KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit · 7e022e71
      Gautham R. Shenoy 提交于
      In guest_exit_cont we call kvmhv_commence_exit which expects the trap
      number as the argument. However r3 doesn't contain the trap number at
      this point and as a result we would be calling the function with a
      spurious trap number.
      
      Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
      r12 contains the trap number.
      
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: eddb60fbSigned-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7e022e71
    • P
      KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs · 5fc3e64f
      Paul Mackerras 提交于
      This fixes a bug which results in stale vcore pointers being left in
      the per-cpu preempted vcore lists when a VM is destroyed.  The result
      of the stale vcore pointers is usually either a crash or a lockup
      inside collect_piggybacks() when another VM is run.  A typical
      lockup message looks like:
      
      [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
      [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
      [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
      [  472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000
      [  472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130
      [  472.162337] REGS: c000001e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
      [  472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  XER: 00000000
      [  472.162588] CFAR: c00000000096b0ac SOFTE: 1
      GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
      GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821
      GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
      GPR12: c000000000111130 c00000000fdae400
      [  472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
      [  472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
      [  472.163179] Call Trace:
      [  472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
      [  472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90
      [  472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
      [  472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
      [  472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [  472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
      [  472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
      [  472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
      [  472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0
      [  472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0
      [  472.164060] Instruction dump:
      [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
      [  472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070
      
      The bug is that kvmppc_run_vcpu does not correctly handle the case
      where a vcpu task receives a signal while its guest vcpu is executing
      in the guest as a result of being piggy-backed onto the execution of
      another vcore.  In that case we need to wait for the vcpu to finish
      executing inside the guest, and then remove this vcore from the
      preempted vcores list.  That way, we avoid leaving this vcpu's vcore
      on the preempted vcores list when the vcpu gets interrupted.
      
      Fixes: ec257165Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5fc3e64f
  10. 18 9月, 2015 6 次提交
    • I
      kvm: svm: reset mmu on VCPU reset · ebae871a
      Igor Mammedov 提交于
      When INIT/SIPI sequence is sent to VCPU which before that
      was in use by OS, VMRUN might fail with:
      
       KVM: entry failed, hardware error 0xffffffff
       EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
       ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
       EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
       ES =0000 00000000 0000ffff 00009300
       CS =9a00 0009a000 0000ffff 00009a00
       [...]
       CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0
       [...]
       EFER=0000000000000000
      
      with corresponding SVM error:
       KVM: FAILED VMRUN WITH VMCB:
       [...]
       cpl:            0                efer:         0000000000001000
       cr0:            0000000080010010 cr2:          00007fd7fe85bf90
       cr3:            0000000187d0c000 cr4:          0000000000000020
       [...]
      
      What happens is that VCPU state right after offlinig:
      CR0: 0x80050033  EFER: 0xd01  CR4: 0x7e0
        -> long mode with CR3 pointing to longmode page tables
      
      and when VCPU gets INIT/SIPI following transition happens
      CR0: 0 -> 0x60000010 EFER: 0x0  CR4: 0x7e0
        -> paging disabled with stale CR3
      
      However SVM under the hood puts VCPU in Paged Real Mode*
      which effectively translates CR0 0x60000010 -> 80010010 after
      
         svm_vcpu_reset()
             -> init_vmcb()
                 -> kvm_set_cr0()
                     -> svm_set_cr0()
      
      but from  kvm_set_cr0() perspective CR0: 0 -> 0x60000010
      only caching bits are changed and
      commit d81135a5
       ("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")'
      regressed svm_vcpu_reset() which relied on MMU being reset.
      
      As result VMRUN after svm_vcpu_reset() tries to run
      VCPU in Paged Real Mode with stale MMU context (longmode page tables),
      which causes some AMD CPUs** to bail out with VMEXIT_INVALID.
      
      Fix issue by unconditionally resetting MMU context
      at init_vmcb() time.
      
      	* AMD64 Architecture Programmer’s Manual,
      	    Volume 2: System Programming, rev: 3.25
      	      15.19 Paged Real Mode
      	** Opteron 1216
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Fixes: d81135a5
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ebae871a
    • H
      s390: wire up separate socketcalls system calls · 977108f8
      Heiko Carstens 提交于
      As discussed on linux-arch all architectures should wire up the separate
      system calls that are hidden behind the socketcall multiplexer system call.
      
      It's just a couple more system calls and gives us a very small performance
      improvement.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      977108f8
    • H
      s390/compat: remove superfluous compat wrappers · 7681df45
      Heiko Carstens 提交于
      A couple of compat wrapper functions are simply trampolines to the real
      system call. This happened because the compat wrapper defines will only
      sign and zero extend system call parameters which are of different size
      on s390/s390x (longs and pointers).
      All other parameters will be correctly sign and zero extended by the
      normal system call wrappers.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      7681df45
    • H
      s390/compat: do not trace compat wrapper functions · a55b2ae7
      Heiko Carstens 提交于
      Add notrace to the compat wrapper define to disable tracing of compat
      wrapper functions. These are supposed to be very small and more or less
      just a trampoline to the real system call.
      
      Also fix indentation.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a55b2ae7
    • S
      alpha: lib: export __delay · 14b97ded
      Sudip Mukherjee 提交于
      __delay was not exported as a result while building with allmodconfig we
      were getting build error of undefined symbol.  __delay is being used by:
      drivers/net/phy/mdio-octeon.c
      Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14b97ded
    • S
      alpha: io: define ioremap_uc · 969560d2
      Sudip Mukherjee 提交于
      ioremap_uc was not defined and as a result while building with
      allmodconfig were getting build error of: implicit declaration of
      function 'ioremap_uc'.
      Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      969560d2
  11. 17 9月, 2015 11 次提交