1. 04 12月, 2017 1 次提交
  2. 29 11月, 2017 2 次提交
    • V
      powerpc: Do not assign thread.tidr if already assigned · 7e4d4233
      Vaibhav Jain 提交于
      If set_thread_tidr() is called twice for same task_struct then it will
      allocate a new tidr value to it leaving the previous value still
      dangling in the vas_thread_ida table.
      
      To fix this the patch changes set_thread_tidr() to check if a tidr
      value is already assigned to the task_struct and if yes then returns
      zero.
      
      Fixes: ec233ede("powerpc: Add support for setting SPRN_TIDR")
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      [mpe: Modify to return 0 in the success case, not the TID value]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7e4d4233
    • V
      powerpc: Avoid signed to unsigned conversion in set_thread_tidr() · aca7573f
      Vaibhav Jain 提交于
      There is an unsafe signed to unsigned conversion in set_thread_tidr()
      that may cause an error value to be assigned to SPRN_TIDR register and
      used as thread-id.
      
      The issue happens as assign_thread_tidr() returns an int and
      thread.tidr is an unsigned-long. So a negative error code returned
      from assign_thread_tidr() will fail the error check and gets assigned
      as tidr as a large positive value.
      
      To fix this the patch assigns the return value of assign_thread_tidr()
      to a temporary int and assigns it to thread.tidr iff its '> 0'.
      
      The patch shouldn't impact the calling convention of set_thread_tidr()
      i.e all -ve return-values are error codes and a return value of '0'
      indicates success.
      
      Fixes: ec233ede("powerpc: Add support for setting SPRN_TIDR")
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aca7573f
  3. 24 11月, 2017 1 次提交
  4. 22 11月, 2017 2 次提交
    • M
      powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features · 4d6c51b1
      Michael Ellerman 提交于
      I got the logic wrong in the DT CPU features code when I added the
      Power9 DD2.1 feature. We should be setting the bit if we detect a
      DD2.1, not clearing it if we detect a DD2.0.
      
      This code isn't actually exercised at the moment so nothing is
      actually broken.
      
      Fixes: 3ffa9d9e ("powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4d6c51b1
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9
  5. 15 11月, 2017 1 次提交
    • M
      powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature · 3ffa9d9e
      Michael Ellerman 提交于
      Recently we added a CPU feature for Power9 DD2.0, to capture the fact
      that some workarounds are required only on Power9 DD1 and DD2.0 but
      not DD2.1 or later.
      
      Then in commit 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and
      DD2.0 ERAT workaround on DD2.1") and commit e3646330
      "powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
      DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:
      
        BEGIN_FTR_SECTION
                PPC_INVALIDATE_ERAT
        END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)
      
      Unfortunately although this reads as "if set DD1 or DD2.0", the or is
      a bitwise or and actually generates a mask of both bits. The code that
      does the feature patching then checks that the value of the CPU
      features masked with that mask are equal to the mask.
      
      So the end result is we're checking for DD1 and DD20 being set, which
      never happens. Yes the API is terrible.
      
      Removing the ERAT workaround on DD2.0 results in random SEGVs, the
      system tends to boot, but things randomly die including sometimes
      dhclient, udev etc.
      
      To fix the problem and hopefully avoid it in future, we remove the
      DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
      allows us to easily express that the workarounds are required if DD2.1
      is not set.
      
      At some point we will drop the DD1 workarounds entirely and some of
      this can be cleaned up.
      
      Fixes: 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
      Fixes: e3646330 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ffa9d9e
  6. 14 11月, 2017 1 次提交
    • M
      powerpc/64s: Fix masking of SRR1 bits on instruction fault · 475b581f
      Michael Ellerman 提交于
      On 64-bit Book3s, when we take an instruction fault the reason for the
      fault may be reported in SRR1. For data faults the reason is reported
      in DSISR (Data Storage Instruction Status Register).
      
      The reasons reported in each do not necessarily correspond, so we mask
      the SRR1 bits before copying them to the DSISR, which is then used by
      the page fault code.
      
      Prior to commit b4c001dc ("powerpc/mm: Use symbolic constants for
      filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000,
      which corresponds to:
      
        DSISR_NOHPTE		0x40000000 /* no translation found */
        DSISR_NOEXEC_OR_G	0x10000000 /* exec of no-exec or guarded */
        DSISR_PROTFAULT	0x08000000 /* protection fault */
        DSISR_KEYFAULT	0x00200000 /* Storage Key fault */
      
      That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but
      incorrectly used a different similarly named DSISR_BAD_FAULT_64S.
      
      This had the effect of changing the mask to 0xa43a0000, which omits
      everything but DSISR_KEYFAULT.
      
      Luckily this had no visible effect, because in practice we hardly use
      the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush
      optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G
      and DSISR_PROTFAULT are both only used to trigger rare warnings.
      
      So we got lucky, but let's fix it. The new value only has bits between
      17 and 30 set, so we can continue to use andis.
      
      Fixes: b4c001dc ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      475b581f
  7. 13 11月, 2017 4 次提交
  8. 12 11月, 2017 6 次提交
  9. 10 11月, 2017 2 次提交
  10. 07 11月, 2017 1 次提交
  11. 06 11月, 2017 13 次提交
    • N
      powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1 · e3646330
      Nicholas Piggin 提交于
      DD2.1 does not have to save MMCR0 for all state-loss idle states,
      only after deep idle states (like other PMU registers).
      Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e3646330
    • N
      powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1 · 9d2f510a
      Nicholas Piggin 提交于
      DD2.1 does not have to flush the ERAT after a state-loss idle.
      
      Performance testing was done on a DD2.1 using only the stop0 idle state
      (the shallowest state which supports state loss), using context_switch
      selftest configured to ping-poing between two threads on the same core
      and two different cores.
      
      Performance improvement for same core is 7.0%, different cores is 14.8%.
      Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9d2f510a
    • N
      powerpc: add POWER9_DD20 feature · b6b3755e
      Nicholas Piggin 提交于
      Cc: Michael Neuling <mikey@neuling.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b6b3755e
    • C
      powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable · 6f700d38
      Cyril Bur 提交于
      After handling a transactional FP, Altivec or VSX unavailable exception.
      The return to userspace code will detect that the TIF_RESTORE_TM bit is
      set and call restore_tm_state(). restore_tm_state() will call
      restore_math() to ensure that the correct facilities are loaded.
      
      This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
      is doing pointless work and can simply be removed.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f700d38
    • C
      powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint · eb5c3f1c
      Cyril Bur 提交于
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      tm_reclaim() has optimisations to not always save the FP/Altivec
      registers to the checkpointed save area. This was originally done
      because the caller might have information that the checkpointed
      registers aren't valid due to lazy save and restore. We've also been a
      little vague as to how tm_reclaim() leaves the FP/Altivec state since it
      doesn't necessarily always save it to the thread struct. This has lead
      to an (incorrect) assumption that it leaves the checkpointed state on
      the CPU.
      
      tm_recheckpoint() has similar optimisations in reverse. It may not
      always reload the checkpointed FP/Altivec registers from the thread
      struct before the trecheckpoint. It is therefore quite unclear where it
      expects to get the state from. This didn't help with the assumption
      made about tm_reclaim().
      
      These optimisations sit in what is by definition a slow path. If a
      process has to go through a reclaim/recheckpoint then its transaction
      will be doomed on returning to userspace. This mean that the process
      will be unable to complete its transaction and be forced to its failure
      handler. This is already an out if line case for userspace. Furthermore,
      the cost of copying 64 times 128 bits from registers isn't very long[0]
      (at all) on modern processors. As such it appears these optimisations
      have only served to increase code complexity and are unlikely to have
      had a measurable performance impact.
      
      Our transactional memory handling has been riddled with bugs. A cause
      of this has been difficulty in following the code flow, code complexity
      has not been our friend here. It makes sense to remove these
      optimisations in favour of a (hopefully) more stable implementation.
      
      This patch does mean that some times the assembly will needlessly save
      'junk' registers which will subsequently get overwritten with the
      correct value by the C code which calls the assembly function. This
      small inefficiency is far outweighed by the reduction in complexity for
      general TM code, context switching paths, and transactional facility
      unavailable exception handler.
      
      0: I tried to measure it once for other work and found that it was
      hiding in the noise of everything else I was working with. I find it
      exceedingly likely this will be the case here.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eb5c3f1c
    • C
      powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception · 91381b9c
      Cyril Bur 提交于
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      tm_reclaim() has optimisations to not always save the FP/Altivec
      registers to the checkpointed save area. This was originally done
      because the caller might have information that the checkpointed
      registers aren't valid due to lazy save and restore. We've also been a
      little vague as to how tm_reclaim() leaves the FP/Altivec state since it
      doesn't necessarily always save it to the thread struct. This has lead
      to an (incorrect) assumption that it leaves the checkpointed state on
      the CPU.
      
      tm_recheckpoint() has similar optimisations in reverse. It may not
      always reload the checkpointed FP/Altivec registers from the thread
      struct before the trecheckpoint. It is therefore quite unclear where it
      expects to get the state from. This didn't help with the assumption
      made about tm_reclaim().
      
      This patch is a minimal fix for ease of backporting. A more correct fix
      which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
      altogether has been upstreamed to apply on top of this patch.
      
      Fixes: dc310669 ("powerpc: tm: Always use fp_state and vr_state to
      store live registers")
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      91381b9c
    • C
      powerpc: Don't enable FP/Altivec if not checkpointed · a7771176
      Cyril Bur 提交于
      Lazy save and restore of FP/Altivec means that a userspace process can
      be sent to userspace with FP or Altivec disabled and loaded only as
      required (by way of an FP/Altivec unavailable exception). Transactional
      Memory complicates this situation as a transaction could be started
      without FP/Altivec being loaded up. This causes the hardware to
      checkpoint incorrect registers. Handling FP/Altivec unavailable
      exceptions while a thread is transactional requires a reclaim and
      recheckpoint to ensure the CPU has correct state for both sets of
      registers.
      
      Lazy save and restore of FP/Altivec cannot be done if a process is
      transactional. If a facility was enabled it must remain enabled whenever
      a thread is transactional.
      
      Commit dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
      transactional memory in use") ensures that the facilities are always
      enabled if a thread is transactional. A bug in the introduced code may
      cause it to inadvertently enable a facility that was (and should remain)
      disabled. The problem with this extraneous enablement is that the
      registers for the erroneously enabled facility have not been correctly
      recheckpointed - the recheckpointing code assumed the facility would
      remain disabled.
      
      Further compounding the issue, the transactional {fp,altivec,vsx}
      unavailable code has been incorrectly using the MSR to enable
      facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
      means if the registers are live on the CPU, not if the kernel should
      load them before returning to userspace. This has worked due to the bug
      mentioned above.
      
      This causes transactional threads which return to their failure handler
      to observe incorrect checkpointed registers. Perhaps an example will
      help illustrate the problem:
      
      A userspace process is running and uses both FP and Altivec registers.
      This process then continues to run for some time without touching
      either sets of registers. The kernel subsequently disables the
      facilities as part of lazy save and restore. The userspace process then
      performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
      registers. The process then performs a floating point instruction
      triggering a fp unavailable exception in the kernel.
      
      The kernel then loads the FP registers - and only the FP registers.
      Since the thread is transactional it must perform a reclaim and
      recheckpoint to ensure both the checkpointed registers and the
      transactional registers are correct. It then (correctly) enables
      MSR[FP] for the process. Later (on exception exist) the kernel also
      (inadvertently) enables MSR[VEC]. The process is then returned to
      userspace.
      
      Since the act of loading the FP registers doomed the transaction we know
      CPU will fail the transaction, restore its checkpointed registers, and
      return the process to its failure handler. The problem is that we're
      now running with Altivec enabled and the 'junk' checkpointed registers
      are restored. The kernel had only recheckpointed FP.
      
      This patch solves this by only activating FP/Altivec if userspace was
      using them when it entered the kernel and not simply if the process is
      transactional.
      
      Fixes: dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
      transactional memory in use")
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a7771176
    • A
      powerpc/eeh: Stop using do_gettimeofday() · edfd17ff
      Arnd Bergmann 提交于
      This interface is inefficient and deprecated because of the y2038
      overflow.
      
      ktime_get_seconds() is an appropriate replacement here, since it
      has sufficient granularity but is more efficient and uses monotonic
      time.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      edfd17ff
    • M
      powerpc/tm: Don't check for WARN in TM Bad Thing handling · 632f0574
      Michael Ellerman 提交于
      Currently when we take a TM Bad Thing program check exception, we
      search the bug table to see if the program check was generated by a
      WARN/WARN_ON etc.
      
      That makes no sense, the WARN macros use trap instructions, which
      should never generate a TM Bad Thing exception. If they ever did that
      would be a bug and we should oops.
      
      We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but
      those are all BUGs not WARNs, and they all use trap instructions
      anyway. Almost certainly this check was incorrectly copied from the
      REASON_TRAP handling in the same function.
      
      Remove it.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-By: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      632f0574
    • M
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman 提交于
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e003747
    • M
      powerpc/64: Free up CPU_FTR_ICSWX · c1807e3f
      Michael Ellerman 提交于
      The last user of CPU_FTR_ICSWX was removed in commit
      6ff4d3e9 ("powerpc: Remove old unused icswx based coprocessor
      support"), so free the bit up for future use.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1807e3f
    • N
      powerpc/64: Fix latency tracing for lazy irq replay · ff967900
      Nicholas Piggin 提交于
      When returning from an exception to a soft-enabled context, pending
      IRQs are replayed but IRQ tracing is not reset, so a number of them
      can get chained together into the same IRQ-disabled trace.
      
      Fix this by having __check_irq_replay re-set IRQ trace. This is
      conceptually where we respond to the next interrupt, so it fits the
      semantics of the IRQ tracer.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ff967900
    • N
      KVM: PPC: Book3S HV: Handle host system reset in guest mode · 6de6638b
      Nicholas Piggin 提交于
      If the host takes a system reset interrupt while a guest is running,
      the CPU must exit the guest before processing the host exception
      handler.
      
      After this patch, taking a sysrq+x with a CPU running in a guest
      gives a trace like this:
      
         cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0]
             pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
             lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
             sp: c000000fdf577850
            msr: 9000000002803033
           current = 0xc000000fdf4b1e00
           paca    = 0xc00000000fd4d680	 softe: 3	 irq_happened: 0x01
             pid   = 6608, comm = qemu-system-ppc
         Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP
         [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv]
         [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm]
         [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm]
         [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm]
         [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0
         [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100
         [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c
         --- Exception: c01 (System Call) at 00007fff76868df0
         SP (7fff7069baf0) is in userspace
      
      Fixes: e36d0a2e ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6de6638b
  12. 03 11月, 2017 1 次提交
    • K
      powerpc/watchdog: Convert timers to use timer_setup() · 5943cf4a
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      5943cf4a
  13. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  14. 01 11月, 2017 2 次提交
    • N
      powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text · e6c4dcb3
      Naveen N. Rao 提交于
      This makes the changes introduced in commit 83e840c7
      ("powerpc64/elfv1: Only dereference function descriptor for non-text
      symbols") to be specific to the kprobe subsystem.
      
      We previously changed ppc_function_entry() to always check the provided
      address to confirm if it needed to be dereferenced. This is actually
      only an issue for kprobe blacklisted asm labels (through use of
      _ASM_NOKPROBE_SYMBOL) and can cause other issues with ftrace. Also, the
      additional checks are not really necessary for our other uses.
      
      As such, move this check to the kprobes subsystem.
      
      Fixes: 83e840c7 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e6c4dcb3
    • P
      KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts · c0101509
      Paul Mackerras 提交于
      This patch removes the restriction that a radix host can only run
      radix guests, allowing us to run HPT (hashed page table) guests as
      well.  This is useful because it provides a way to run old guest
      kernels that know about POWER8 but not POWER9.
      
      Unfortunately, POWER9 currently has a restriction that all threads
      in a given code must either all be in HPT mode, or all in radix mode.
      This means that when entering a HPT guest, we have to obtain control
      of all 4 threads in the core and get them to switch their LPIDR and
      LPCR registers, even if they are not going to run a guest.  On guest
      exit we also have to get all threads to switch LPIDR and LPCR back
      to host values.
      
      To make this feasible, we require that KVM not be in the "independent
      threads" mode, and that the CPU cores be in single-threaded mode from
      the host kernel's perspective (only thread 0 online; threads 1, 2 and
      3 offline).  That allows us to use the same code as on POWER8 for
      obtaining control of the secondary threads.
      
      To manage the LPCR/LPIDR changes required, we extend the kvm_split_info
      struct to contain the information needed by the secondary threads.
      All threads perform a barrier synchronization (where all threads wait
      for every other thread to reach the synchronization point) on guest
      entry, both before and after loading LPCR and LPIDR.  On guest exit,
      they all once again perform a barrier synchronization both before
      and after loading host values into LPCR and LPIDR.
      
      Finally, it is also currently necessary to flush the entire TLB every
      time we enter a HPT guest on a radix host.  We do this on thread 0
      with a loop of tlbiel instructions.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c0101509
  15. 25 10月, 2017 1 次提交
  16. 22 10月, 2017 1 次提交
    • M
      powerpc: Disable the fast-endian switch syscall by default · 727f1361
      Michael Ellerman 提交于
      Back in 2008 we added support for "fast little-endian switch" in the
      syscall path. This added a special case syscall number 0x1ebe, which
      is caught very early in the system call exception and switches endian
      with as little overhead as possible. See commit 745a14cc
      ("[POWERPC] Add fast little-endian switch system call") for full
      details.
      
      Although it is fast, it's also completely non standard. The "syscall
      number" is out of the range of normal syscalls, it can't be traced or
      audited, and it's a bit of a wart. To the best of our knowledge it was
      only used by one program, now long since discontinued.
      
      So in an effort to shake out any current users, put it behind a config
      option, and make it default n. If anyone *is* using it they can
      quickly reinstate it with a rebuild, and we can flip it to default y.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      727f1361