1. 18 8月, 2015 1 次提交
  2. 19 6月, 2015 5 次提交
    • T
      timer: Minimize nohz off overhead · 683be13a
      Thomas Gleixner 提交于
      If nohz is disabled on the kernel command line the [hr]timer code
      still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
      pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
      avoid the overhead.
      
      Before:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.73%  hog       [.] main
        15.36%  [kernel]  [k] _raw_spin_lock_irqsave
         9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.61%  [kernel]  [k] lock_timer_base.isra.38
         6.42%  [kernel]  [k] mod_timer
         3.90%  [kernel]  [k] detach_if_pending
         3.76%  [kernel]  [k] del_timer
         2.41%  [kernel]  [k] internal_add_timer
         1.39%  [kernel]  [k] __internal_add_timer
         0.76%  [kernel]  [k] timerfn
      
      We probably should have a cached value for nohz full in the per cpu
      bases as well to avoid the cpumask check. The base cache line is hot
      already, the cpumask not necessarily.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      683be13a
    • T
      timer: Reduce timer migration overhead if disabled · bc7a34b8
      Thomas Gleixner 提交于
      Eric reported that the timer_migration sysctl is not really nice
      performance wise as it needs to check at every timer insertion whether
      the feature is enabled or not. Further the check does not live in the
      timer code, so we have an extra function call which checks an extra
      cache line to figure out that it is disabled.
      
      We can do better and store that information in the per cpu (hr)timer
      bases. I pondered to use a static key, but that's a nightmare to
      update from the nohz code and the timer base cache line is hot anyway
      when we select a timer base.
      
      The old logic enabled the timer migration unconditionally if
      CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
      line.
      
      With this modification, we start off with migration disabled. The user
      visible sysctl is still set to enabled. If the kernel switches to NOHZ
      migration is enabled, if the user did not disable it via the sysctl
      prior to the switch. If nohz=off is on the kernel command line,
      migration stays disabled no matter what.
      
      Before:
        47.76%  hog       [.] main
        14.84%  [kernel]  [k] _raw_spin_lock_irqsave
         9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.71%  [kernel]  [k] mod_timer
         6.24%  [kernel]  [k] lock_timer_base.isra.38
         3.76%  [kernel]  [k] detach_if_pending
         3.71%  [kernel]  [k] del_timer
         2.50%  [kernel]  [k] internal_add_timer
         1.51%  [kernel]  [k] get_nohz_timer_target
         1.28%  [kernel]  [k] __internal_add_timer
         0.78%  [kernel]  [k] timerfn
         0.48%  [kernel]  [k] wake_up_nohz_cpu
      
      After:
        48.10%  hog       [.] main
        15.25%  [kernel]  [k] _raw_spin_lock_irqsave
         9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
         6.50%  [kernel]  [k] mod_timer
         6.44%  [kernel]  [k] lock_timer_base.isra.38
         3.87%  [kernel]  [k] detach_if_pending
         3.80%  [kernel]  [k] del_timer
         2.67%  [kernel]  [k] internal_add_timer
         1.33%  [kernel]  [k] __internal_add_timer
         0.73%  [kernel]  [k] timerfn
         0.54%  [kernel]  [k] wake_up_nohz_cpu
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Joonwoo Park <joonwoop@codeaurora.org>
      Cc: Wenbo Wang <wenbo.wang@memblaze.com>
      Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bc7a34b8
    • P
      hrtimer: Allow hrtimer::function() to free the timer · 887d9dc9
      Peter Zijlstra 提交于
      Currently an hrtimer callback function cannot free its own timer
      because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
      after it. Freeing the timer would result in a clear use-after-free.
      
      Solve this by using a scheme similar to regular timers; track the
      current running timer in hrtimer_clock_base::running.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      887d9dc9
    • P
      hrtimer: Fix hrtimer_is_queued() hole · 8edfb036
      Peter Zijlstra 提交于
      A queued hrtimer that gets restarted (hrtimer_start*() while
      hrtimer_is_queued()) will briefly appear as unqueued/inactive, even
      though the timer has always been active, we just moved it.
      
      Close this hole by preserving timer->state in
      hrtimer_start_range_ns()'s remove_hrtimer() call.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.175989138@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8edfb036
    • O
      hrtimer: Remove HRTIMER_STATE_MIGRATE · c04dca02
      Oleg Nesterov 提交于
      I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
      confused it looks buggy and simply unneeded.
      
      migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
      is not enough: this can fool, say, hrtimer_is_queued() in
      dequeue_signal().
      
      Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
      This fixes the race and we can kill STATE_MIGRATE.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c04dca02
  3. 19 5月, 2015 1 次提交
    • V
      clockevents: Stop unused clockevent devices · d2540875
      Viresh Kumar 提交于
      To avoid getting spurious interrupts on a tickless CPU, clockevent
      device can now be stopped by switching to ONESHOT_STOPPED state.
      
      The natural place for handling this transition is tick_program_event().
      
      On 'expires == KTIME_MAX', we skip programming the event and so we need
      to fix such call sites as well, to always call tick_program_event()
      irrespective of the expires value.
      
      Once the clockevent device is required again, check if it was earlier
      put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT
      before programming its event.
      
      To make sure we haven't missed any corner case, add a WARN() for the
      case where we try to reprogram clockevent device while we aren't
      configured in ONESHOT_STOPPED state.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: linaro-kernel@lists.linaro.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/5146b07be7f0bc497e0ebae036590ec2fa73e540.1428031396.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d2540875
  4. 13 5月, 2015 1 次提交
  5. 22 4月, 2015 19 次提交
  6. 03 4月, 2015 2 次提交
  7. 01 4月, 2015 1 次提交
  8. 13 2月, 2015 1 次提交
    • A
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski 提交于
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  9. 05 2月, 2015 1 次提交
  10. 24 1月, 2015 2 次提交
  11. 23 1月, 2015 1 次提交
    • T
      hrtimer: Prevent stale expiry time in hrtimer_interrupt() · 9bc74919
      Thomas Gleixner 提交于
      hrtimer_interrupt() has the following subtle issue:
      
      hrtimer_interrupt()
        lock(cpu_base);
        expires_next = KTIME_MAX;
      
        expire_timers(CLOCK_MONOTONIC);
        expires = get_next_timer(CLOCK_MONOTONIC);
        if (expires < expires_next)
          expires_next = expires;
      
        expire_timers(CLOCK_REALTIME);
          unlock(cpu_base);
          wakeup()
          hrtimer_start(CLOCK_MONOTONIC, newtimer);
          lock(cpu_base();  
        expires = get_next_timer(CLOCK_REALTIME);
        if (expires < expires_next)
          expires_next = expires;
      
      So because we already evaluated the next expiring timer of
      CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
      earlier than the overall next expiry time in hrtimer_interrupt().
      
      To solve this, remove the caching of the next expiry value from
      hrtimer_interrupt() and reevaluate all active clock bases for the next
      expiry value. To avoid another code duplication, create a shared
      evaluation function and use it for hrtimer_get_next_event(),
      hrtimer_force_reprogram() and hrtimer_interrupt().
      
      There is another subtlety in this mechanism:
      
      While hrtimer_interrupt() is running, we want to avoid to touch the
      hardware device because we will reprogram it anyway at the end of
      hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
      via the HRTIMER_RESTART mechanism, because we drop out when the
      callback on that CPU is running. But that fails, if a new timer gets
      enqueued like in the example above.
      
      This has another implication: While hrtimer_interrupt() is running we
      refuse remote enqueueing of timers - see hrtimer_interrupt() and
      hrtimer_check_target().
      
      hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
      to KTIME_MAX, but that fails if a new timer gets queued.
      
      Prevent both the hardware access and the remote enqueue
      explicitely. We can loosen the restriction on the remote enqueue now
      due to reevaluation of the next expiry value, but that needs a
      seperate patch.
      
      Folded in a fix from Vignesh Radhakrishnan.
      Reported-and-tested-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Based-on-patch-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vigneshr@codeaurora.org
      Cc: john.stultz@linaro.org
      Cc: viresh.kumar@linaro.org
      Cc: fweisbec@gmail.com
      Cc: cl@linux.com
      Cc: stuart.w.hayes@gmail.com
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bc74919
  12. 19 9月, 2014 1 次提交
    • K
      sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() · f139caf2
      Kirill Tkhai 提交于
      schedule(), io_schedule() and schedule_timeout() always return
      with TASK_RUNNING state set, so one more setting is unnecessary.
      
      (All places in patch are visible good, only exception is
       kiblnd_scheduler() from:
      
            drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
      
       Its schedule() is one line above standard 3 lines of unified diff)
      
      No places where set_current_state() is used for mb().
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Anil Belur <askb23@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dmitry Eremin <dmitry.eremin@intel.com>
      Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Isaac Huang <he.huang@intel.com>
      Cc: James E.J. Bottomley <JBottomley@parallels.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liang Zhen <liang.zhen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masaru Nomura <massa.nomura@gmail.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: Peng Tao <bergwolf@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Robert Love <robert.w.love@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Cc: Zi Shen Lim <zlim.lnx@gmail.com>
      Cc: devel@driverdev.osuosl.org
      Cc: dm-devel@redhat.com
      Cc: dri-devel@lists.freedesktop.org
      Cc: fcoe-devel@open-fcoe.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-afs@lists.infradead.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: qla2xxx-upstream@qlogic.com
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f139caf2
  13. 27 8月, 2014 2 次提交
  14. 24 7月, 2014 2 次提交