1. 30 12月, 2017 2 次提交
  2. 28 12月, 2017 1 次提交
  3. 15 12月, 2017 1 次提交
    • T
      posix-timer: Properly check sigevent->sigev_notify · cef31d9a
      Thomas Gleixner 提交于
      timer_create() specifies via sigevent->sigev_notify the signal delivery for
      the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
      and (SIGEV_SIGNAL | SIGEV_THREAD_ID).
      
      The sanity check in good_sigevent() is only checking the valid combination
      for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
      not set it accepts any random value.
      
      This has no real effects on the posix timer and signal delivery code, but
      it affects show_timer() which handles the output of /proc/$PID/timers. That
      function uses a string array to pretty print sigev_notify. The access to
      that array has no bound checks, so random sigev_notify cause access beyond
      the array bounds.
      
      Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
      masking from various code pathes as SIGEV_NONE can never be set in
      combination with SIGEV_THREAD_ID.
      Reported-by: NEric Biggers <ebiggers3@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      cef31d9a
  4. 22 11月, 2017 5 次提交
    • K
      timer: Pass function down to initialization routines · 188665b2
      Kees Cook 提交于
      In preparation for removing more macros, pass the function down to the
      initialization routines instead of doing it in macros.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      188665b2
    • K
      timer: Switch callback prototype to take struct timer_list * argument · 354b46b1
      Kees Cook 提交于
      Since all callbacks have been converted, we can switch the core
      prototype to "struct timer_list *" now too.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      354b46b1
    • K
      timer: Pass timer_list pointer to callbacks unconditionally · c1eba5bc
      Kees Cook 提交于
      Now that all timer callbacks are already taking their struct timer_list
      pointer as the callback argument, just do this unconditionally and remove
      the .data field.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      c1eba5bc
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
    • K
      treewide: init_timer() -> setup_timer() · b9eaf187
      Kees Cook 提交于
      This mechanically converts all remaining cases of ancient open-coded timer
      setup with the old setup_timer() API, which is the first step in timer
      conversions. This has no behavioral changes, since it ultimately just
      changes the order of assignment to fields of struct timer_list when
      finding variations of:
      
          init_timer(&t);
          f.function = timer_callback;
          t.data = timer_callback_arg;
      
      to be converted into:
      
          setup_timer(&t, timer_callback, timer_callback_arg);
      
      The conversion is done with the following Coccinelle script, which
      is an improved version of scripts/cocci/api/setup_timer.cocci, in the
      following ways:
       - assignments-before-init_timer() cases
       - limit the .data case removal to the specific struct timer_list instance
       - handling calls by dereference (timer->field vs timer.field)
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/setup_timer.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       init_timer(
      -&(e)
      +&e
       , ...)
      
      // Match the common cases first to avoid Coccinelle parsing loops with
      // "... when" clauses.
      
      @match_immediate_function_data_after_init_timer@
      expression e, func, da;
      @@
      
      -init_timer
      +setup_timer
       ( \(&e\|e\)
      +, func, da
       );
      (
      -\(e.function\|e->function\) = func;
      -\(e.data\|e->data\) = da;
      |
      -\(e.data\|e->data\) = da;
      -\(e.function\|e->function\) = func;
      )
      
      @match_immediate_function_data_before_init_timer@
      expression e, func, da;
      @@
      
      (
      -\(e.function\|e->function\) = func;
      -\(e.data\|e->data\) = da;
      |
      -\(e.data\|e->data\) = da;
      -\(e.function\|e->function\) = func;
      )
      -init_timer
      +setup_timer
       ( \(&e\|e\)
      +, func, da
       );
      
      @match_function_and_data_after_init_timer@
      expression e, e2, e3, e4, e5, func, da;
      @@
      
      -init_timer
      +setup_timer
       ( \(&e\|e\)
      +, func, da
       );
       ... when != func = e2
           when != da = e3
      (
      -e.function = func;
      ... when != da = e4
      -e.data = da;
      |
      -e->function = func;
      ... when != da = e4
      -e->data = da;
      |
      -e.data = da;
      ... when != func = e5
      -e.function = func;
      |
      -e->data = da;
      ... when != func = e5
      -e->function = func;
      )
      
      @match_function_and_data_before_init_timer@
      expression e, e2, e3, e4, e5, func, da;
      @@
      (
      -e.function = func;
      ... when != da = e4
      -e.data = da;
      |
      -e->function = func;
      ... when != da = e4
      -e->data = da;
      |
      -e.data = da;
      ... when != func = e5
      -e.function = func;
      |
      -e->data = da;
      ... when != func = e5
      -e->function = func;
      )
      ... when != func = e2
          when != da = e3
      -init_timer
      +setup_timer
       ( \(&e\|e\)
      +, func, da
       );
      
      @r1 exists@
      expression t;
      identifier f;
      position p;
      @@
      
      f(...) { ... when any
        init_timer@p(\(&t\|t\))
        ... when any
      }
      
      @r2 exists@
      expression r1.t;
      identifier g != r1.f;
      expression e8;
      @@
      
      g(...) { ... when any
        \(t.data\|t->data\) = e8
        ... when any
      }
      
      // It is dangerous to use setup_timer if data field is initialized
      // in another function.
      @script:python depends on r2@
      p << r1.p;
      @@
      
      cocci.include_match(False)
      
      @r3@
      expression r1.t, func, e7;
      position r1.p;
      @@
      
      (
      -init_timer@p(&t);
      +setup_timer(&t, func, 0UL);
      ... when != func = e7
      -t.function = func;
      |
      -t.function = func;
      ... when != func = e7
      -init_timer@p(&t);
      +setup_timer(&t, func, 0UL);
      |
      -init_timer@p(t);
      +setup_timer(t, func, 0UL);
      ... when != func = e7
      -t->function = func;
      |
      -t->function = func;
      ... when != func = e7
      -init_timer@p(t);
      +setup_timer(t, func, 0UL);
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      b9eaf187
  5. 14 11月, 2017 1 次提交
  6. 13 11月, 2017 1 次提交
  7. 12 11月, 2017 2 次提交
    • D
      timers: Add a function to start/reduce a timer · b24591e2
      David Howells 提交于
      Add a function, similar to mod_timer(), that will start a timer if it isn't
      running and will modify it if it is running and has an expiry time longer
      than the new time.  If the timer is running with an expiry time that's the
      same or sooner, no change is made.
      
      The function looks like:
      
      	int timer_reduce(struct timer_list *timer, unsigned long expires);
      
      This can be used by code such as networking code to make it easier to share
      a timer for multiple timeouts.  For instance, in upcoming AF_RXRPC code,
      the rxrpc_call struct will maintain a number of timeouts:
      
      	unsigned long	ack_at;
      	unsigned long	resend_at;
      	unsigned long	ping_at;
      	unsigned long	expect_rx_by;
      	unsigned long	expect_req_by;
      	unsigned long	expect_term_by;
      
      each of which is set independently of the others.  With timer reduction
      available, when the code needs to set one of the timeouts, it only needs to
      look at that timeout and then call timer_reduce() to modify the timer,
      starting it or bringing it forward if necessary.  There is no need to refer
      to the other timeouts to see which is earliest and no need to take any lock
      other than, potentially, the timer lock inside timer_reduce().
      
      Note, that this does not protect against concurrent invocations of any of
      the timer functions.
      
      As an example, the expect_rx_by timeout above, which terminates a call if
      we don't get a packet from the server within a certain time window, would
      be set something like this:
      
      	unsigned long now = jiffies;
      	unsigned long expect_rx_by = now + packet_receive_timeout;
      	WRITE_ONCE(call->expect_rx_by, expect_rx_by);
      	timer_reduce(&call->timer, expect_rx_by);
      
      The timer service code (which might, say, be in a work function) would then
      check all the timeouts to see which, if any, had triggered, deal with
      those:
      
      	t = READ_ONCE(call->ack_at);
      	if (time_after_eq(now, t)) {
      		cmpxchg(&call->ack_at, t, now + MAX_JIFFY_OFFSET);
      		set_bit(RXRPC_CALL_EV_ACK, &call->events);
      	}
      
      and then restart the timer if necessary by finding the soonest timeout that
      hasn't yet passed and then calling timer_reduce().
      
      The disadvantage of doing things this way rather than comparing the timers
      each time and calling mod_timer() is that you *will* take timer events
      unless you can finish what you're doing and delete the timer in time.
      
      The advantage of doing things this way is that you don't need to use a lock
      to work out when the next timer should be set, other than the timer's own
      lock - which you might not have to take.
      
      [ tglx: Fixed weird formatting and adopted it to pending changes ]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: keyrings@vger.kernel.org
      Cc: linux-afs@lists.infradead.org
      Link: https://lkml.kernel.org/r/151023090769.23050.1801643667223880753.stgit@warthog.procyon.org.uk
      b24591e2
    • A
      pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() · df27067e
      Arnd Bergmann 提交于
      __getnstimeofday() is a rather odd interface, with a number of quirks:
      
      - The caller may come from NMI context, but the implementation is not NMI safe,
        one way to get there from NMI is
      
            NMI handler:
              something bad
                panic()
                  kmsg_dump()
                    pstore_dump()
                       pstore_record_init()
                         __getnstimeofday()
      
      - The calling conventions are different from any other timekeeping functions,
        to deal with returning an error code during suspended timekeeping.
      
      Address the above issues by using a completely different method to get the
      time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior
      when timekeeping is suspended: it returns the time at which it got
      suspended. As Thomas Gleixner explained, this is safe, as
      ktime_get_real_fast_ns() does not call into the clocksource driver that
      might be suspended.
      
      The result can easily be transformed into a timespec structure. Since
      ktime_get_real_fast_ns() was not exported to modules, add the export.
      
      The pstore behavior for the suspended case changes slightly, as it now
      stores the timestamp at which timekeeping was suspended instead of storing
      a zero timestamp.
      
      This change is not addressing y2038-safety, that's subject to a more
      complex follow up patch.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Colin Cross <ccross@android.com>
      Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de
      df27067e
  8. 08 11月, 2017 3 次提交
  9. 02 11月, 2017 3 次提交
    • R
      kernel/time/Kconfig: Fix typo in comment · 6082a6e4
      Randy Dunlap 提交于
      Fix typo in Kconfig comment text.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Jiri Kosina <trivial@kernel.org>
      Link: https://lkml.kernel.org/r/0e586dd4-2b27-864e-c252-bc72df52fd01@infradead.org
      6082a6e4
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
    • P
      clockevents: Update clockevents device next_event on stop · 39c82caf
      Prasad Sodagudi 提交于
      clockevent_device::next_event holds the next timer event of a clock event
      device. The value is updated in clockevents_program_event(), i.e. when the
      hardware timer is armed for the next expiry.
      
      When there are no software timers armed on a CPU, the corresponding per CPU
      clockevent device is brought into ONESHOT_STOPPED state, but
      clockevent_device::next_event is not updated, because
      clockevents_program_event() is not called.
      
      So the content of clockevent_device::next_event is stale, which is not an
      issue when real hardware is used. But the hrtimer broadcast device relies
      on that information and the stale value causes spurious wakeups.
      
      Update clockevent_device::next_event to KTIME_MAX when it has been brought
      into ONESHOT_STOPPED state to avoid spurious wakeups. This reflects the
      proper expiry time of the stopped timer: infinity.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPrasad Sodagudi <psodagud@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/1509043042-32486-1-git-send-email-psodagud@codeaurora.org
      39c82caf
  10. 31 10月, 2017 5 次提交
    • A
      time: Move time_t conversion helpers to time32.h · abc8f96e
      Arnd Bergmann 提交于
      On 64-bit architectures, the timespec64 based helpers in linux/time.h
      are defined as macros pointing to their timespec based counterparts.
      This made sense when they were first introduced, but as we are migrating
      away from timespec in general, it's much less intuitive now.
      
      This changes the macros to work in the exact opposite way: we always
      provide the timespec64 based helpers and define the old interfaces as
      macros for them. Now we can move those macros into linux/time32.h, which
      already contains the respective helpers for 32-bit architectures.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      abc8f96e
    • A
      time: Remove unused functions · 85bf19e7
      Arnd Bergmann 提交于
      The (slow but) ongoing work on conversion from timespec to timespec64
      has led some timespec based helper functions to become unused.
      
      No new code should use them, so we can remove the functions entirely.
      I'm planning to obsolete additional interfaces next and remove
      more of these.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      85bf19e7
    • A
      timekeeping: Use timespec64 in timekeeping_inject_offset · 1572fa03
      Arnd Bergmann 提交于
      As part of changing all the timekeeping code to use 64-bit
      time_t consistently, this removes the uses of timeval
      and timespec as much as possible from do_adjtimex() and
      timekeeping_inject_offset(). The timeval_inject_offset_valid()
      and timespec_inject_offset_valid() just complicate this,
      so I'm folding them into the respective callers.
      
      This leaves the actual 'struct timex' definition, which
      is part of the user-space ABI and should be dealt with
      separately when we have agreed on the ABI change.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      1572fa03
    • A
      timekeeping: Consolidate timekeeping_inject_offset code · e0956dcc
      Arnd Bergmann 提交于
      The code to check the adjtimex() or clock_adjtime() arguments is spread
      out across multiple files for presumably only historic reasons. As a
      preparatation for a rework to get rid of the use of 'struct timeval'
      and 'struct timespec' in there, this moves all the portions into
      kernel/time/timekeeping.c and marks them as 'static'.
      
      The warp_clock() function here is not as closely related as the others,
      but I feel it still makes sense to move it here in order to consolidate
      all callers of timekeeping_inject_offset().
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      [jstultz: Whitespace fixup]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      e0956dcc
    • J
      rtc: Allow rtc drivers to specify the tv_nsec value for ntp · 0f295b06
      Jason Gunthorpe 提交于
      ntp is currently hardwired to try and call the rtc set when wall clock
      tv_nsec is 0.5 seconds. This historical behaviour works well with certain
      PC RTCs, but is not universal to all rtc hardware.
      
      Change how this works by introducing the driver specific concept of
      set_offset_nsec, the delay between current wall clock time and the target
      time to set (with a 0 tv_nsecs).
      
      For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
      second to be written 0.5 s after it has started.
      
      For compat with the old rtc_set_ntp_time, the value is defaulted to
      + 0.5 s, which causes the next second to be written 0.5s before it starts,
      as things were before this patch.
      
      Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
      so ultimately each RTC driver should set the set_offset_nsec according
      to its needs, and non x86 architectures should stop using
      update_persistent_clock64 in order to access this feature.
      Future patches will revise the drivers as needed.
      
      Since CMOS and RTC now have very different handling they are split
      into two dedicated code paths, sharing the support code, and ifdefs
      are replaced with IS_ENABLED.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      0f295b06
  11. 27 10月, 2017 2 次提交
  12. 19 10月, 2017 1 次提交
    • J
      clockevents: Retry programming min delta up to 10 times · 3a29ddb1
      James Hogan 提交于
      When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path
      hrtimer_reprogram -> clockevents_program_event ->
      clockevents_program_min_delta will not retry if the clock event driver
      returns -ETIME.
      
      If the driver could not satisfy the program_min_delta for any reason, the
      lack of a retry means the CPU may not receive a tick interrupt, potentially
      until the counter does a full period. This leads to rcu_sched timeout
      messages as the stalled CPU is detected by other CPUs, and other issues if
      the CPU is holding locks or other resources at the point at which it
      stalls.
      
      There have been a couple of observed mechanisms through which a clock event
      driver could not satisfy the requested min_delta and return -ETIME.
      
      With the MIPS GIC driver, the shared execution resource within MT cores
      means inconventient latency due to execution of instructions from other
      hardware threads in the core, within gic_next_event, can result in an event
      being set in the past.
      
      Additionally under virtualisation it is possible to get unexpected latency
      during a clockevent device's set_next_event() callback which can make it
      return -ETIME even for a delta based on min_delta_ns.
      
      It isn't appropriate to use MIN_ADJUST in the virtualisation case as
      occasional hypervisor induced high latency will cause min_delta_ns to
      quickly increase to the maximum.
      
      Instead, borrow the retry pattern from the MIN_ADJUST case, but without
      making adjustments. Retry up to 10 times, each time increasing the
      attempted delta by min_delta, before giving up.
      
      [ Matt: Reworked the loop and made retry increase the delta. ]
      Signed-off-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: "Martin Schwidefsky" <schwidefsky@de.ibm.com>
      Cc: James Hogan <james.hogan@mips.com>
      Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com
      3a29ddb1
  13. 18 10月, 2017 2 次提交
  14. 17 10月, 2017 2 次提交
  15. 10 10月, 2017 1 次提交
  16. 05 10月, 2017 1 次提交
    • K
      timer: Convert schedule_timeout() to use from_timer() · 58e1177b
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new from_timer() helper and passing
      the timer pointer explicitly. Since this special timer is on the stack, it
      needs to have a wrapper structure to carry state once .data is eliminated.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: linux-s390@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Harish Patil <harish.patil@cavium.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-pm@vger.kernel.org
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mark Gross <mark.gross@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: linux-watchdog@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Michael Reed <mdr@sgi.com>
      Cc: netdev@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lkml.kernel.org/r/1507159627-127660-2-git-send-email-keescook@chromium.org
      58e1177b
  17. 26 9月, 2017 2 次提交
  18. 09 9月, 2017 1 次提交
  19. 01 9月, 2017 1 次提交
  20. 26 8月, 2017 1 次提交
    • J
      time: Fix ktime_get_raw() incorrect base accumulation · 0bcdc098
      John Stultz 提交于
      In comqit fc6eead7 ("time: Clean up CLOCK_MONOTONIC_RAW time
      handling"), the following code got mistakenly added to the update of the
      raw timekeeper:
      
       /* Update the monotonic raw base */
       seconds = tk->raw_sec;
       nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
       tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
      
      Which adds the raw_sec value and the shifted down raw xtime_nsec to the
      base value.
      
      But the read function adds the shifted down tk->tkr_raw.xtime_nsec value
      another time, The result of this is that ktime_get_raw() users (which are
      all internal users) see the raw time move faster then it should (the rate
      at which can vary with the current size of tkr_raw.xtime_nsec), which has
      resulted in at least problems with graphics rendering performance.
      
      The change tried to match the monotonic base update logic:
      
       seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
       nsec = (u32) tk->wall_to_monotonic.tv_nsec;
       tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
      
      Which adds the wall_to_monotonic.tv_nsec value, but not the
      tk->tkr_mono.xtime_nsec value to the base.
      
      To fix this, simplify the tkr_raw.base accumulation to only accumulate the
      raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which
      will be added at read time.
      
      Fixes: fc6eead7 ("time: Clean up CLOCK_MONOTONIC_RAW time handling")
      Reported-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stultz@linaro.org
      0bcdc098
  21. 24 8月, 2017 1 次提交
    • N
      timers: Fix excessive granularity of new timers after a nohz idle · 2fe59f50
      Nicholas Piggin 提交于
      When a timer base is idle, it is forwarded when a new timer is added
      to ensure that granularity does not become excessive. When not idle,
      the timer tick is expected to increment the base.
      
      However there are several problems:
      
      - If an existing timer is modified, the base is forwarded only after
        the index is calculated.
      
      - The base is not forwarded by add_timer_on.
      
      - There is a window after a timer is restarted from a nohz idle, after
        it is marked not-idle and before the timer tick on this CPU, where a
        timer may be added but the ancient base does not get forwarded.
      
      These result in excessive granularity (a 1 jiffy timeout can blow out
      to 100s of jiffies), which cause the rcu lockup detector to trigger,
      among other things.
      
      Fix this by keeping track of whether the timer base has been idle
      since it was last run or forwarded, and if so then forward it before
      adding a new timer.
      
      There is still a case where mod_timer optimises the case of a pending
      timer mod with the same expiry time, where the timer can see excessive
      granularity relative to the new, shorter interval. A comment is added,
      but it's not changed because it is an important fastpath for
      networking.
      
      This has been tested and found to fix the RCU softlockup messages.
      
      Testing was also done with tracing to measure requested versus
      achieved wakeup latencies for all non-deferrable timers in an idle
      system (with no lockup watchdogs running). Wakeup latency relative to
      absolute latency is calculated (note this suffers from round-up skew
      at low absolute times) and analysed:
      
                   max     avg      std
      upstream   506.0    1.20     4.68
      patched      2.0    1.08     0.15
      
      The bug was noticed due to the lockup detector Kconfig changes
      dropping it out of people's .configs and resulting in larger base
      clk skew When the lockup detectors are enabled, no CPU can go idle for
      longer than 4 seconds, which limits the granularity errors.
      Sub-optimal timer behaviour is observable on a smaller scale in that
      case:
      
      	     max     avg      std
      upstream     9.0    1.05     0.19
      patched      2.0    1.04     0.11
      
      Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: NDavid Miller <davem@davemloft.net>
      Cc: dzickus@redhat.com
      Cc: sfr@canb.auug.org.au
      Cc: mpe@ellerman.id.au
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: linuxarm@huawei.com
      Cc: abdhalee@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: akpm@linux-foundation.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: torvalds@linux-foundation.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com
      2fe59f50
  22. 18 8月, 2017 1 次提交