1. 16 3月, 2015 1 次提交
    • L
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power... · b253149b
      Len Brown 提交于
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
      
      In Linux-3.9 we removed the mwait_idle() loop:
      
        69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
      
      The reasoning was that modern machines should be sufficiently
      happy during the boot process using the default_idle() HALT
      loop, until cpuidle loads and either acpi_idle or intel_idle
      invoke the newer MWAIT-with-hints idle loop.
      
      But two machines reported problems:
      
       1. Certain Core2-era machines support MWAIT-C1 and HALT only.
          MWAIT-C1 is preferred for optimal power and performance.
          But if they support just C1, cpuidle never loads and
          so they use the boot-time default idle loop forever.
      
       2. Some laptops will boot-hang if HALT is used,
          but will boot successfully if MWAIT is used.
          This appears to be a hidden assumption in BIOS SMI,
          that is presumably valid on the proprietary OS
          where the BIOS was validated.
      
             https://bugzilla.kernel.org/show_bug.cgi?id=60770
      
      So here we effectively revert the patch above, restoring
      the mwait_idle() loop.  However, we don't bother restoring
      the idle=mwait cmdline parameter, since it appears to add
      no value.
      
      Maintainer notes:
      
        For 3.9, simply revert 69fb3676
        for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
        context For 3.11, 3.12, 3.13, this patch applies cleanly
      Tested-by: NMike Galbraith <bitbucket@online.de>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Acked-by: NMike Galbraith <bitbucket@online.de>
      Cc: <stable@vger.kernel.org> # 3.9+
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ian Malone <ibmalone@gmail.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
      [ Ported to recent kernels. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b253149b
  2. 01 2月, 2015 3 次提交
    • A
      x86_64, entry: Remove the syscall exit audit and schedule optimizations · 96b6352c
      Andy Lutomirski 提交于
      We used to optimize rescheduling and audit on syscall exit.  Now
      that the full slow path is reasonably fast, remove these
      optimizations.  Syscall exit auditing is now handled exclusively by
      syscall_trace_leave.
      
      This adds something like 10ns to the previously optimized paths on
      my computer, presumably due mostly to SAVE_REST / RESTORE_REST.
      
      I think that we should eventually replace both the syscall and
      non-paranoid interrupt exit slow paths with a pair of C functions
      along the lines of the syscall entry hooks.
      
      Link: http://lkml.kernel.org/r/22f2aa4a0361707a5cfb1de9d45260b39965dead.1421453410.git.luto@amacapital.netAcked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      96b6352c
    • A
      x86_64, entry: Use sysret to return to userspace when possible · 2a23c6b8
      Andy Lutomirski 提交于
      The x86_64 entry code currently jumps through complex and
      inconsistent hoops to try to minimize the impact of syscall exit
      work.  For a true fast-path syscall, almost nothing needs to be
      done, so returning is just a check for exit work and sysret.  For a
      full slow-path return from a syscall, the C exit hook is invoked if
      needed and we join the iret path.
      
      Using iret to return to userspace is very slow, so the entry code
      has accumulated various special cases to try to do certain forms of
      exit work without invoking iret.  This is error-prone, since it
      duplicates assembly code paths, and it's dangerous, since sysret
      can malfunction in interesting ways if used carelessly.  It's
      also inefficient, since a lot of useful cases aren't optimized
      and therefore force an iret out of a combination of paranoia and
      the fact that no one has bothered to write even more asm code
      to avoid it.
      
      I would argue that this approach is backwards.  Rather than trying
      to avoid the iret path, we should instead try to make the iret path
      fast.  Under a specific set of conditions, iret is unnecessary.  In
      particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not
      set, and both SS and CS are as expected, then
      movq 32(%rsp),%rsp;sysret does the same thing as iret.  This set of
      conditions is nearly always satisfied on return from syscalls, and
      it can even occasionally be satisfied on return from an irq.
      
      Even with the careful checks for sysret applicability, this cuts
      nearly 80ns off of the overhead from syscalls with unoptimized exit
      work.  This includes tracing and context tracking, and any return
      that invokes KVM's user return notifier.  For example, the cost of
      getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to
      ~280ns on my computer.
      
      This may allow the removal and even eventual conversion to C
      of a respectable amount of exit asm.
      
      This may require further tweaking to give the full benefit on Xen.
      
      It may be worthwhile to adjust signal delivery and exec to try hit
      the sysret path.
      
      This does not optimize returns to 32-bit userspace.  Making the same
      optimization for CS == __USER32_CS is conceptually straightforward,
      but it will require some tedious code to handle the differences
      between sysretl and sysexitl.
      
      Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.netSigned-off-by: NAndy Lutomirski <luto@amacapital.net>
      2a23c6b8
    • A
      x86, traps: Fix ist_enter from userspace · b926e6f6
      Andy Lutomirski 提交于
      context_tracking_user_exit() has no effect if in_interrupt() returns true,
      so ist_enter() didn't work.  Fix it by calling exception_enter(), and thus
      context_tracking_user_exit(), before incrementing the preempt count.
      
      This also adds an assertion that will catch the problem reliably if
      CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced.
      
      Link: http://lkml.kernel.org/r/261ebee6aee55a4724746d0d7024697013c40a08.1422709102.git.luto@amacapital.net
      Fixes: 95927475 x86, traps: Track entry into and exit from IST context
      Reported-and-tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      b926e6f6
  3. 28 1月, 2015 4 次提交
  4. 23 1月, 2015 5 次提交
  5. 22 1月, 2015 23 次提交
  6. 20 1月, 2015 4 次提交
    • K
      x86, hyperv: Mark the Hyper-V clocksource as being continuous · 32c6590d
      K. Y. Srinivasan 提交于
      The Hyper-V clocksource is continuous; mark it accordingly.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Acked-by: jasowang@redhat.com
      Cc: gregkh@linuxfoundation.org
      Cc: devel@linuxdriverproject.org
      Cc: olaf@aepfle.de
      Cc: apw@canonical.com
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      32c6590d
    • O
      x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() · 7575637a
      Oleg Nesterov 提交于
      math_state_restore() can race with kernel_fpu_begin() if irq comes
      right after __thread_fpu_begin(), __save_init_fpu() will overwrite
      fpu->state we are going to restore.
      
      Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable()
      which simply set/clear in_kernel_fpu, and change math_state_restore()
      to exclude kernel_fpu_begin() in between.
      
      Alternatively we could use local_irq_save/restore, but probably these
      new helpers can have more users.
      
      Perhaps they should disable/enable preemption themselves, in this case
      we can remove preempt_disable() in __restore_xstate_sig().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: matt.fleming@intel.com
      Cc: bp@suse.de
      Cc: pbonzini@redhat.com
      Cc: luto@amacapital.net
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Link: http://lkml.kernel.org/r/20150115192028.GD27332@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7575637a
    • O
      x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end() · 33a3ebdc
      Oleg Nesterov 提交于
      Now that we have in_kernel_fpu we can remove __thread_clear_has_fpu()
      in __kernel_fpu_begin(). And this allows to replace the asymmetrical
      and nontrivial use_eager_fpu + tsk_used_math check in kernel_fpu_end()
      with the same __thread_has_fpu() check.
      
      The logic becomes really simple; if _begin() does save() then _end()
      needs restore(), this is controlled by __thread_has_fpu(). Otherwise
      they do clts/stts unless use_eager_fpu().
      
      Not only this makes begin/end symmetrical and imo more understandable,
      potentially this allows to change irq_fpu_usable() to avoid all other
      checks except "in_kernel_fpu".
      
      Also, with this patch __kernel_fpu_end() does restore_fpu_checking()
      and WARNs if it fails instead of math_state_restore(). I think this
      looks better because we no longer need __thread_fpu_begin(), and it
      would be better to report the failure in this case.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: matt.fleming@intel.com
      Cc: bp@suse.de
      Cc: pbonzini@redhat.com
      Cc: luto@amacapital.net
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Link: http://lkml.kernel.org/r/20150115192005.GC27332@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      33a3ebdc
    • O
      x86, fpu: Introduce per-cpu in_kernel_fpu state · 14e153ef
      Oleg Nesterov 提交于
      interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin()
      is safe or not. In particular it should obviously deny the nested
      kernel_fpu_begin() and this logic looks very confusing.
      
      If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in
      interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does
      __thread_clear_has_fpu().
      
      Otherwise we demand that the interrupted task has no FPU if it is in
      kernel mode, this works because __kernel_fpu_begin() does clts() and
      interrupted_kernel_fpu_idle() checks X86_CR0_TS.
      
      Add the per-cpu "bool in_kernel_fpu" variable, and change this code
      to check/set/clear it. This allows to do more cleanups and fixes, see
      the next changes.
      
      The patch also moves WARN_ON_ONCE() under preempt_disable() just to
      make this_cpu_read() look better, this is not really needed. And in
      fact I think we should move it into __kernel_fpu_begin().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: matt.fleming@intel.com
      Cc: bp@suse.de
      Cc: pbonzini@redhat.com
      Cc: luto@amacapital.net
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Link: http://lkml.kernel.org/r/20150115191943.GB27332@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      14e153ef