1. 30 10月, 2019 28 次提交
  2. 29 10月, 2019 2 次提交
    • P
      tracing: Fix race in perf_trace_buf initialization · 5ce7528c
      Prateek Sood 提交于
      commit 6b1340cc00edeadd52ebd8a45171f38c8de2a387 upstream.
      
      A race condition exists while initialiazing perf_trace_buf from
      perf_trace_init() and perf_kprobe_init().
      
            CPU0                                        CPU1
      perf_trace_init()
        mutex_lock(&event_mutex)
          perf_trace_event_init()
            perf_trace_event_reg()
              total_ref_count == 0
      	buf = alloc_percpu()
              perf_trace_buf[i] = buf
              tp_event->class->reg() //fails       perf_kprobe_init()
      	goto fail                              perf_trace_event_init()
                                                       perf_trace_event_reg()
              fail:
      	  total_ref_count == 0
      
                                                         total_ref_count == 0
                                                         buf = alloc_percpu()
                                                         perf_trace_buf[i] = buf
                                                         tp_event->class->reg()
                                                         total_ref_count++
      
                free_percpu(perf_trace_buf[i])
                perf_trace_buf[i] = NULL
      
      Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
      causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
      getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
      event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
      fix this race.
      
      The race caused the following bug:
      
       Unable to handle kernel paging request at virtual address 0000003106f2003c
       Mem abort info:
         ESR = 0x96000045
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000045
         CM = 0, WnR = 1
       user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
       [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
       Internal error: Oops: 96000045 [#1] PREEMPT SMP
       Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
       pstate: 80400005 (Nzcv daif +PAN -UAO)
       pc : __memset+0x20/0x1ac
       lr : memset+0x3c/0x50
       sp : ffffffc09319fc50
      
        __memset+0x20/0x1ac
        perf_trace_buf_alloc+0x140/0x1a0
        perf_trace_sys_enter+0x158/0x310
        syscall_trace_enter+0x348/0x7c0
        el0_svc_common+0x11c/0x368
        el0_svc_handler+0x12c/0x198
        el0_svc+0x8/0xc
      
      Ramdumps showed the following:
        total_ref_count = 3
        perf_trace_buf = (
            0x0 -> NULL,
            0x0 -> NULL,
            0x0 -> NULL,
            0x0 -> NULL)
      
      Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org
      
      Cc: stable@vger.kernel.org
      Fixes: e12f03d7 ("perf/core: Implement the 'perf_kprobe' PMU")
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NPrateek Sood <prsood@codeaurora.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ce7528c
    • A
      perf/aux: Fix AUX output stopping · 96202569
      Alexander Shishkin 提交于
      commit f3a519e4add93b7b31a6616f0b09635ff2e6a159 upstream.
      
      Commit:
      
        8a58ddae2379 ("perf/core: Fix exclusive events' grouping")
      
      allows CAP_EXCLUSIVE events to be grouped with other events. Since all
      of those also happen to be AUX events (which is not the case the other
      way around, because arch/s390), this changes the rules for stopping the
      output: the AUX event may not be on its PMU's context any more, if it's
      grouped with a HW event, in which case it will be on that HW event's
      context instead. If that's the case, munmap() of the AUX buffer can't
      find and stop the AUX event, potentially leaving the last reference with
      the atomic context, which will then end up freeing the AUX buffer. This
      will then trip warnings:
      
      Fix this by using the context's PMU context when looking for events
      to stop, instead of the event's PMU context.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96202569
  3. 18 10月, 2019 7 次提交
  4. 12 10月, 2019 3 次提交
    • B
      tick: broadcast-hrtimer: Fix a race in bc_set_next · e5331c37
      Balasubramani Vivekanandan 提交于
      [ Upstream commit b9023b91dd020ad7e093baa5122b6968c48cc9e0 ]
      
      When a cpu requests broadcasting, before starting the tick broadcast
      hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
      using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
      the required synchronization when the callback is active on other core.
      
      The callback could have already executed tick_handle_oneshot_broadcast()
      and could have also returned. But still there is a small time window where
      the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
      without doing anything, but the next_event of the tick broadcast clock
      device is already set to a timeout value.
      
      In the race condition diagram below, CPU #1 is running the timer callback
      and CPU #2 is entering idle state and so calls bc_set_next().
      
      In the worst case, the next_event will contain an expiry time, but the
      hrtimer will not be started which happens when the racing callback returns
      HRTIMER_NORESTART. The hrtimer might never recover if all further requests
      from the CPUs to subscribe to tick broadcast have timeout greater than the
      next_event of tick broadcast clock device. This leads to cascading of
      failures and finally noticed as rcu stall warnings
      
      Here is a depiction of the race condition
      
      CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                        and subscribe to
                                                        tick broadcast)
      ---------------------                             ---------------------
      
      __run_hrtimer()                                   tick_broadcast_enter()
      
        bc_handler()                                      __tick_broadcast_oneshot_control()
      
          tick_handle_oneshot_broadcast()
      
            raw_spin_lock(&tick_broadcast_lock);
      
            dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
            //next_event for tick broadcast clock
            set to KTIME_MAX since no other cores
            subscribed to tick broadcasting
      
            raw_spin_unlock(&tick_broadcast_lock);
      
          if (dev->next_event == KTIME_MAX)
            return HRTIMER_NORESTART
          // callback function exits without
             restarting the hrtimer                      //tick_broadcast_lock acquired
                                                         raw_spin_lock(&tick_broadcast_lock);
      
                                                         tick_broadcast_set_event()
      
                                                           clockevents_program_event()
      
                                                             dev->next_event = expires;
      
                                                             bc_set_next()
      
                                                               hrtimer_try_to_cancel()
                                                               //returns -1 since the timer
                                                               callback is active. Exits without
                                                               restarting the timer
        cpu_base->running = NULL;
      
      The comment that hrtimer cannot be armed from within the callback is
      wrong. It is fine to start the hrtimer from within the callback. Also it is
      safe to start the hrtimer from the enter/exit idle code while the broadcast
      handler is active. The enter/exit idle code and the broadcast handler are
      synchronized using tick_broadcast_lock. So there is no need for the
      existing try to cancel logic. All this can be removed which will eliminate
      the race condition as well.
      
      Fixes: 5d1638ac ("tick: Introduce hrtimer based broadcast")
      Originally-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBalasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      e5331c37
    • V
      kernel/elfcore.c: include proper prototypes · b0aaf65b
      Valdis Kletnieks 提交于
      [ Upstream commit 0f74914071ab7e7b78731ed62bf350e3a344e0a5 ]
      
      When building with W=1, gcc properly complains that there's no prototypes:
      
        CC      kernel/elfcore.o
      kernel/elfcore.c:7:17: warning: no previous prototype for 'elf_core_extra_phdrs' [-Wmissing-prototypes]
          7 | Elf_Half __weak elf_core_extra_phdrs(void)
            |                 ^~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:12:12: warning: no previous prototype for 'elf_core_write_extra_phdrs' [-Wmissing-prototypes]
         12 | int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:17:12: warning: no previous prototype for 'elf_core_write_extra_data' [-Wmissing-prototypes]
         17 | int __weak elf_core_write_extra_data(struct coredump_params *cprm)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/elfcore.c:22:15: warning: no previous prototype for 'elf_core_extra_data_size' [-Wmissing-prototypes]
         22 | size_t __weak elf_core_extra_data_size(void)
            |               ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Provide the include file so gcc is happy, and we don't have potential code drift
      
      Link: http://lkml.kernel.org/r/29875.1565224705@turing-policeSigned-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b0aaf65b
    • K
      sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() · 46ff0e2f
      KeMeng Shi 提交于
      [ Upstream commit 714e501e16cd473538b609b3e351b2cc9f7f09ed ]
      
      An oops can be triggered in the scheduler when running qemu on arm64:
      
       Unable to handle kernel paging request at virtual address ffff000008effe40
       Internal error: Oops: 96000007 [#1] SMP
       Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
       pstate: 20000085 (nzCv daIf -PAN -UAO)
       pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
       lr : move_queued_task.isra.21+0x124/0x298
       ...
       Call trace:
        __ll_sc___cmpxchg_case_acq_4+0x4/0x20
        __migrate_task+0xc8/0xe0
        migration_cpu_stop+0x170/0x180
        cpu_stopper_thread+0xec/0x178
        smpboot_thread_fn+0x1ac/0x1e8
        kthread+0x134/0x138
        ret_from_fork+0x10/0x18
      
      __set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
      migrage the process if process is not currently running on any one of the
      CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
      invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
      CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
      check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
      corresponding bit is coincidentally set. As a consequence, kernel will
      access an invalid rq address associate with the invalid CPU in
      migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
      
      The reproduce the crash:
      
        1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
        sched_setaffinity.
      
        2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
        and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
      
        3) Oops appears if the invalid CPU is set in memory after tested cpumask.
      Signed-off-by: NKeMeng Shi <shikemeng@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      46ff0e2f