1. 26 4月, 2018 1 次提交
    • T
      tick/sched: Do not mess with an enqueued hrtimer · 1f71addd
      Thomas Gleixner 提交于
      Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
      did great debugging, which provided enough context to decode the problem.
      
      CPU 3			     	      	     CPU 2
      
      idle
      start sched_timer expires = 712171000000
       queue->next = sched_timer
      					    start rdmavt timer. expires = 712172915662
      					    lock(baseof(CPU3))
      tick_nohz_stop_tick()
      tick = 716767000000			    timerqueue_add(tmr)
      
      hrtimer_set_expires(sched_timer, tick);
        sched_timer->expires = 716767000000  <---- FAIL
      					     if (tmr->expires < queue->next->expires)
      hrtimer_start(sched_timer)		          queue->next = tmr;
      lock(baseof(CPU3))
      					     unlock(baseof(CPU3))
      timerqueue_remove()
      timerqueue_add()
      
      ts->sched_timer is queued and queue->next is pointing to it, but then
      ts->sched_timer.expires is modified.
      
      This not only corrupts the ordering of the timerqueue RB tree, it also
      makes CPU2 see the new expiry time of timerqueue->next->expires when
      checking whether timerqueue->next needs to be updated. So CPU2 sees that
      the rdma timer is earlier than timerqueue->next and sets the rdma timer as
      new next.
      
      Depending on whether it had also seen the new time at RB tree enqueue, it
      might have queued the rdma timer at the wrong place and then after removing
      the sched_timer the RB tree is completely hosed.
      
      The problem was introduced with a commit which tried to solve inconsistency
      between the hrtimer in the tick_sched data and the underlying hardware
      clockevent. It split out hrtimer_set_expires() to store the new tick time
      in both the NOHZ and the NOHZ + HIGHRES case, but missed the fact that in
      the NOHZ + HIGHRES case the hrtimer might still be queued.
      
      Use hrtimer_start(timer, tick...) for the NOHZ + HIGHRES case which sets
      timer->expires after canceling the timer and move the hrtimer_set_expires()
      invocation into the NOHZ only code path which is not affected as it merily
      uses the hrtimer as next event storage so code pathes can be shared with
      the NOHZ + HIGHRES case.
      
      Fixes: d4af6d93 ("nohz: Fix spurious warning when hrtimer and clockevent get out of sync")
      Reported-by: N"Wan Kaike" <kaike.wan@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NFrederic Weisbecker <frederic@kernel.org>
      Cc: "Marciniszyn Mike" <mike.marciniszyn@intel.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: linux-rdma@vger.kernel.org
      Cc: "Dalessandro Dennis" <dennis.dalessandro@intel.com>
      Cc: "Fleck John" <john.fleck@intel.com>
      Cc: stable@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: "Weiny Ira" <ira.weiny@intel.com>
      Cc: "linux-rdma@vger.kernel.org"
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804241637390.1679@nanos.tec.linutronix.de
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804242119210.1597@nanos.tec.linutronix.de
      
      1f71addd
  2. 21 4月, 2018 1 次提交
    • K
      fork: unconditionally clear stack on fork · e01e8063
      Kees Cook 提交于
      One of the classes of kernel stack content leaks[1] is exposing the
      contents of prior heap or stack contents when a new process stack is
      allocated.  Normally, those stacks are not zeroed, and the old contents
      remain in place.  In the face of stack content exposure flaws, those
      contents can leak to userspace.
      
      Fixing this will make the kernel no longer vulnerable to these flaws, as
      the stack will be wiped each time a stack is assigned to a new process.
      There's not a meaningful change in runtime performance; it almost looks
      like it provides a benefit.
      
      Performing back-to-back kernel builds before:
      	Run times: 157.86 157.09 158.90 160.94 160.80
      	Mean: 159.12
      	Std Dev: 1.54
      
      and after:
      	Run times: 159.31 157.34 156.71 158.15 160.81
      	Mean: 158.46
      	Std Dev: 1.46
      
      Instead of making this a build or runtime config, Andy Lutomirski
      recommended this just be enabled by default.
      
      [1] A noisy search for many kinds of stack content leaks can be seen here:
      https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
      
      I did some more with perf and cycle counts on running 100,000 execs of
      /bin/true.
      
      before:
      Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
      Mean:  221015379122.60
      Std Dev: 4662486552.47
      
      after:
      Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
      Mean:  217745009865.40
      Std Dev: 5935559279.99
      
      It continues to look like it's faster, though the deviation is rather
      wide, but I'm not sure what I could do that would be less noisy.  I'm
      open to ideas!
      
      Link: http://lkml.kernel.org/r/20180221021659.GA37073@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e01e8063
  3. 19 4月, 2018 1 次提交
  4. 17 4月, 2018 9 次提交
  5. 14 4月, 2018 14 次提交
  6. 12 4月, 2018 9 次提交
  7. 11 4月, 2018 5 次提交