• P
    perf: Fix missing SIGTRAPs · ca6c2132
    Peter Zijlstra 提交于
    Marco reported:
    
    Due to the implementation of how SIGTRAP are delivered if
    perf_event_attr::sigtrap is set, we've noticed 3 issues:
    
      1. Missing SIGTRAP due to a race with event_sched_out() (more
         details below).
    
      2. Hardware PMU events being disabled due to returning 1 from
         perf_event_overflow(). The only way to re-enable the event is
         for user space to first "properly" disable the event and then
         re-enable it.
    
      3. The inability to automatically disable an event after a
         specified number of overflows via PERF_EVENT_IOC_REFRESH.
    
    The worst of the 3 issues is problem (1), which occurs when a
    pending_disable is "consumed" by a racing event_sched_out(), observed
    as follows:
    
    		CPU0			|	CPU1
    	--------------------------------+---------------------------
    	__perf_event_overflow()		|
    	 perf_event_disable_inatomic()	|
    	  pending_disable = CPU0	| ...
    					| _perf_event_enable()
    					|  event_function_call()
    					|   task_function_call()
    					|    /* sends IPI to CPU0 */
    	<IPI>				| ...
    	 __perf_event_enable()		+---------------------------
    	  ctx_resched()
    	   task_ctx_sched_out()
    	    ctx_sched_out()
    	     group_sched_out()
    	      event_sched_out()
    	       pending_disable = -1
    	</IPI>
    	<IRQ-work>
    	 perf_pending_event()
    	  perf_pending_event_disable()
    	   /* Fails to send SIGTRAP because no pending_disable! */
    	</IRQ-work>
    
    In the above case, not only is that particular SIGTRAP missed, but also
    all future SIGTRAPs because 'event_limit' is not reset back to 1.
    
    To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
    of a separate 'pending_sigtrap', no longer using 'event_limit' and
    'pending_disable' for its delivery.
    
    Additionally; and different to Marco's proposed patch:
    
     - recognise that pending_disable effectively duplicates oncpu for
       the case where it is set. As such, change the irq_work handler to
       use ->oncpu to target the event and use pending_* as boolean toggles.
    
     - observe that SIGTRAP targets the ctx->task, so the context switch
       optimization that carries contexts between tasks is invalid. If
       the irq_work were delayed enough to hit after a context switch the
       SIGTRAP would be delivered to the wrong task.
    
     - observe that if the event gets scheduled out
       (rotation/migration/context-switch/...) the irq-work would be
       insufficient to deliver the SIGTRAP when the event gets scheduled
       back in (the irq-work might still be pending on the old CPU).
    
       Therefore have event_sched_out() convert the pending sigtrap into a
       task_work which will deliver the signal at return_to_user.
    
    Fixes: 97ba62b2 ("perf: Add support for SIGTRAP on perf events")
    Reported-by: NDmitry Vyukov <dvyukov@google.com>
    Debugged-by: NDmitry Vyukov <dvyukov@google.com>
    Reported-by: NMarco Elver <elver@google.com>
    Debugged-by: NMarco Elver <elver@google.com>
    Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: NMarco Elver <elver@google.com>
    Tested-by: NMarco Elver <elver@google.com>
    ca6c2132
perf_event.h 49.9 KB