1. 04 2月, 2015 1 次提交
  2. 01 2月, 2015 3 次提交
    • A
      x86_64, entry: Remove the syscall exit audit and schedule optimizations · 96b6352c
      Andy Lutomirski 提交于
      We used to optimize rescheduling and audit on syscall exit.  Now
      that the full slow path is reasonably fast, remove these
      optimizations.  Syscall exit auditing is now handled exclusively by
      syscall_trace_leave.
      
      This adds something like 10ns to the previously optimized paths on
      my computer, presumably due mostly to SAVE_REST / RESTORE_REST.
      
      I think that we should eventually replace both the syscall and
      non-paranoid interrupt exit slow paths with a pair of C functions
      along the lines of the syscall entry hooks.
      
      Link: http://lkml.kernel.org/r/22f2aa4a0361707a5cfb1de9d45260b39965dead.1421453410.git.luto@amacapital.netAcked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      96b6352c
    • A
      x86_64, entry: Use sysret to return to userspace when possible · 2a23c6b8
      Andy Lutomirski 提交于
      The x86_64 entry code currently jumps through complex and
      inconsistent hoops to try to minimize the impact of syscall exit
      work.  For a true fast-path syscall, almost nothing needs to be
      done, so returning is just a check for exit work and sysret.  For a
      full slow-path return from a syscall, the C exit hook is invoked if
      needed and we join the iret path.
      
      Using iret to return to userspace is very slow, so the entry code
      has accumulated various special cases to try to do certain forms of
      exit work without invoking iret.  This is error-prone, since it
      duplicates assembly code paths, and it's dangerous, since sysret
      can malfunction in interesting ways if used carelessly.  It's
      also inefficient, since a lot of useful cases aren't optimized
      and therefore force an iret out of a combination of paranoia and
      the fact that no one has bothered to write even more asm code
      to avoid it.
      
      I would argue that this approach is backwards.  Rather than trying
      to avoid the iret path, we should instead try to make the iret path
      fast.  Under a specific set of conditions, iret is unnecessary.  In
      particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not
      set, and both SS and CS are as expected, then
      movq 32(%rsp),%rsp;sysret does the same thing as iret.  This set of
      conditions is nearly always satisfied on return from syscalls, and
      it can even occasionally be satisfied on return from an irq.
      
      Even with the careful checks for sysret applicability, this cuts
      nearly 80ns off of the overhead from syscalls with unoptimized exit
      work.  This includes tracing and context tracking, and any return
      that invokes KVM's user return notifier.  For example, the cost of
      getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to
      ~280ns on my computer.
      
      This may allow the removal and even eventual conversion to C
      of a respectable amount of exit asm.
      
      This may require further tweaking to give the full benefit on Xen.
      
      It may be worthwhile to adjust signal delivery and exec to try hit
      the sysret path.
      
      This does not optimize returns to 32-bit userspace.  Making the same
      optimization for CS == __USER32_CS is conceptually straightforward,
      but it will require some tedious code to handle the differences
      between sysretl and sysexitl.
      
      Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.netSigned-off-by: NAndy Lutomirski <luto@amacapital.net>
      2a23c6b8
    • A
      x86, traps: Fix ist_enter from userspace · b926e6f6
      Andy Lutomirski 提交于
      context_tracking_user_exit() has no effect if in_interrupt() returns true,
      so ist_enter() didn't work.  Fix it by calling exception_enter(), and thus
      context_tracking_user_exit(), before incrementing the preempt count.
      
      This also adds an assertion that will catch the problem reliably if
      CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced.
      
      Link: http://lkml.kernel.org/r/261ebee6aee55a4724746d0d7024697013c40a08.1422709102.git.luto@amacapital.net
      Fixes: 95927475 x86, traps: Track entry into and exit from IST context
      Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      b926e6f6
  3. 31 1月, 2015 1 次提交
  4. 30 1月, 2015 6 次提交
    • R
      KVM: x86: check LAPIC presence when building apic_map · df04d1d1
      Radim Krčmář 提交于
      We forgot to re-check LAPIC after splitting the loop in commit
      173beedc (KVM: x86: Software disabled APIC should still deliver
      NMIs, 2014-11-02).
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Fixes: 173beedcSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      df04d1d1
    • M
      arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault · 0d3e4d4f
      Marc Zyngier 提交于
      When handling a fault in stage-2, we need to resync I$ and D$, just
      to be sure we don't leave any old cache line behind.
      
      That's very good, except that we do so using the *user* address.
      Under heavy load (swapping like crazy), we may end up in a situation
      where the page gets mapped in stage-2 while being unmapped from
      userspace by another CPU.
      
      At that point, the DC/IC instructions can generate a fault, which
      we handle with kvm->mmu_lock held. The box quickly deadlocks, user
      is unhappy.
      
      Instead, perform this invalidation through the kernel mapping,
      which is guaranteed to be present. The box is much happier, and so
      am I.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0d3e4d4f
    • M
      arm/arm64: KVM: Invalidate data cache on unmap · 363ef89f
      Marc Zyngier 提交于
      Let's assume a guest has created an uncached mapping, and written
      to that page. Let's also assume that the host uses a cache-coherent
      IO subsystem. Let's finally assume that the host is under memory
      pressure and starts to swap things out.
      
      Before this "uncached" page is evicted, we need to make sure
      we invalidate potential speculated, clean cache lines that are
      sitting there, or the IO subsystem is going to swap out the
      cached view, loosing the data that has been written directly
      into memory.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      363ef89f
    • M
      arm/arm64: KVM: Use set/way op trapping to track the state of the caches · 3c1e7165
      Marc Zyngier 提交于
      Trying to emulate the behaviour of set/way cache ops is fairly
      pointless, as there are too many ways we can end-up missing stuff.
      Also, there is some system caches out there that simply ignore
      set/way operations.
      
      So instead of trying to implement them, let's convert it to VA ops,
      and use them as a way to re-enable the trapping of VM ops. That way,
      we can detect the point when the MMU/caches are turned off, and do
      a full VM flush (which is what the guest was trying to do anyway).
      
      This allows a 32bit zImage to boot on the APM thingy, and will
      probably help bootloaders in general.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3c1e7165
    • L
      arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device() · eab8d653
      Laurent Pinchart 提交于
      Commit 4bb25789 ("arm: dma-mapping: plumb our iommu mapping ops
      into arch_setup_dma_ops") moved the setting of the DMA operations from
      arm_iommu_attach_device() to arch_setup_dma_ops() where the DMA
      operations to be used are selected based on whether the device is
      connected to an IOMMU. However, the IOMMU detection scheme requires the
      IOMMU driver to be ported to the new IOMMU of_xlate API. As no driver
      has been ported yet, this effectively breaks all IOMMU ARM users that
      depend on the IOMMU being handled transparently by the DMA mapping API.
      
      Fix this by restoring the setting of DMA IOMMU ops in
      arm_iommu_attach_device() and splitting the rest of the function into a
      new internal __arm_iommu_attach_device() function, called by
      arch_setup_dma_ops().
      Signed-off-by: NLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      eab8d653
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  5. 29 1月, 2015 4 次提交
  6. 28 1月, 2015 3 次提交
  7. 27 1月, 2015 1 次提交
  8. 26 1月, 2015 1 次提交
  9. 25 1月, 2015 1 次提交
  10. 24 1月, 2015 1 次提交
  11. 23 1月, 2015 10 次提交
  12. 22 1月, 2015 2 次提交
    • L
      nios2: fix kuser trampoline address · d24c8163
      Ley Foon Tan 提交于
      __kuser_sigtramp address should be 0x1044 instead of 0x1040.
      Signed-off-by: NLey Foon Tan <lftan@altera.com>
      d24c8163
    • S
      powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared · 0eb13208
      Shreyas B. Prabhu 提交于
      LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
      cause exit from power-saving mode. While waking up from winkle, restoring
      LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
      issue in the following scenario:
      
      - All the threads in a core are offlined. The core enters deep winkle.
      - Spurious interrupt wakes up a thread in the core. Here LPCR is restored
        with LPCR_PECE1 bit set.
      - Since it was a spurious interrupt on a offline thread, the thread clears
        the interrupt and goes back to winkle.
      - Here before the thread executes winkle and puts the core into deep winkle,
        if a decrementer interrupt occurs on any of the sibling threads in the core
        that thread wakes up.
      - Since in offline loop we are flushing interrupt only in case of external
        interrupt, the decrementer interrupt does not get flushed. So at this stage
        the thread is stuck in this is loop of waking up at 0x100 due to decrementer
        interrupt, not flushing the interrupt as only external interrupts get flushed,
        entering winkle, waking up at 0x100 again.
      
      Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit
      cleared when waking up from winkle.
      Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0eb13208
  13. 21 1月, 2015 2 次提交
  14. 20 1月, 2015 4 次提交