1. 15 8月, 2018 1 次提交
  2. 13 8月, 2018 2 次提交
    • H
      parisc: Drop architecture-specific ENOTSUP define · 93cb8e20
      Helge Deller 提交于
      parisc is the only Linux architecture which has defined a value for ENOTSUP.
      All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers.
      
      Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives
      problems with userspace programs which expect both to be the same.  One such
      example is a build error in the libuv package, as can be seen in
      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237.
      
      Since we dropped HP-UX support, there is no real benefit in keeping an own
      value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the
      kernel sources. glibc needs no patch, it reuses the exported headers.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      93cb8e20
    • L
      init: rename and re-order boot_cpu_state_init() · b5b1404d
      Linus Torvalds 提交于
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5b1404d
  3. 10 8月, 2018 2 次提交
  4. 09 8月, 2018 2 次提交
  5. 07 8月, 2018 1 次提交
    • T
      cpu/hotplug: Fix SMT supported evaluation · bc2d8d26
      Thomas Gleixner 提交于
      Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
      cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
      on the kernel command line as it cannot differentiate between SMT disabled
      by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
      makes the sysfs interface unusable.
      
      Rework this so that during bringup of the non boot CPUs the availability of
      SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
      'primary' thread then set the local cpu_smt_available marker and evaluate
      this explicitely right after the initial SMP bringup has finished.
      
      SMT evaulation on x86 is a trainwreck as the firmware has all the
      information _before_ booting the kernel, but there is no interface to query
      it.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      bc2d8d26
  6. 06 8月, 2018 3 次提交
  7. 03 8月, 2018 4 次提交
    • F
      nohz: Fix missing tick reprogram when interrupting an inline softirq · 0a0e0829
      Frederic Weisbecker 提交于
      The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
      a nesting interrupt. This stands as an optimization: whether a hardirq or a
      softirq is interrupted, the tick is going to be reprogrammed when necessary
      at the end of the inner interrupt, with even potential new updates on the
      timer queue.
      
      When soft interrupts are interrupted, it's assumed that they are executing
      on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
      called after softirq processing to take care of the tick reprogramming.
      
      But the assumption is wrong: softirqs can be processed inline as well, ie:
      outside of an interrupt, like in a call to local_bh_enable() or from
      ksoftirqd.
      
      Inline softirqs don't reprogram the tick once they are done, as opposed to
      interrupt tail softirq processing. So if a tick interrupts an inline
      softirq processing, the next timer will neither be reprogrammed from the
      interrupting tick's irq_exit() nor after the interrupted softirq
      processing. This situation may leave the tick unprogrammed while timers are
      armed.
      
      To fix this, simply keep reprogramming the tick even if a softirq has been
      interrupted. That can be optimized further, but for now correctness is more
      important.
      
      Note that new timers enqueued in nohz_full mode after a softirq gets
      interrupted will still be handled just fine through self-IPIs triggered by
      the timer code.
      Reported-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org # 4.14+
      Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
      0a0e0829
    • T
      genirq: Make force irq threading setup more robust · d1f0301b
      Thomas Gleixner 提交于
      The support of force threading interrupts which are set up with both a
      primary and a threaded handler wreckaged the setup of regular requested
      threaded interrupts (primary handler == NULL).
      
      The reason is that it does not check whether the primary handler is set to
      the default handler which wakes the handler thread. Instead it replaces the
      thread handler with the primary handler as it would do with force threaded
      interrupts which have been requested via request_irq(). So both the primary
      and the thread handler become the same which then triggers the warnon that
      the thread handler tries to wakeup a not configured secondary thread.
      
      Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
      when requesting the threaded interrupt, which is normaly caught by the
      sanity checks when force irq threading is disabled.
      
      Fix it by skipping the force threading setup when a regular threaded
      interrupt is requested. As a consequence the interrupt request which lacks
      the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
      it.
      
      Fixes: 2a1d3ab8 ("genirq: Handle force threading of irqs with primary and thread handler")
      Reported-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Cc: stable@vger.kernel.org
      d1f0301b
    • S
      watchdog: Reduce message verbosity · 1b6266eb
      Sinan Kaya 提交于
      Code is emitting the following error message during boot on systems
      without PMU hardware support while probing NMI capability.
      
       NMI watchdog: Perf event create on CPU 0 failed with -2
      
      This error is emitted as the perf subsystem returns -ENOENT due to lack of
      PMUs in the system.
      
      It is followed by the warning that NMI watchdog is disabled:
      
        NMI watchdog: Perf NMI watchdog permanently disabled
      
      While NMI disabled information is useful for ordinary users, seeing a PERF
      event create failed with error code -2 is not.
      
      Reduce the message severity to debug so that if debugging is still possible
      in case the error code returned by perf is required for analysis.
      Signed-off-by: NSinan Kaya <okaya@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599368
      Link: https://lkml.kernel.org/r/20180803060943.2643-1-okaya@kernel.org
      1b6266eb
    • P
      genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete · 4f7799d9
      Palmer Dabbelt 提交于
      Now that every user of MULTI_IRQ_HANDLER has been convereted over to use
      GENERIC_IRQ_MULTI_HANDLER remove the references to MULTI_IRQ_HANDLER.
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux@armlinux.org.uk
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: jonas@southpole.se
      Cc: stefan.kristiansson@saunalahti.fi
      Cc: shorne@gmail.com
      Cc: jason@lakedaemon.net
      Cc: marc.zyngier@arm.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: nicolas.pitre@linaro.org
      Cc: vladimir.murzin@arm.com
      Cc: keescook@chromium.org
      Cc: jinb.park7@gmail.com
      Cc: yamada.masahiro@socionext.com
      Cc: alexandre.belloni@bootlin.com
      Cc: pombredanne@nexb.com
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: kstewart@linuxfoundation.org
      Cc: jhogan@kernel.org
      Cc: mark.rutland@arm.com
      Cc: ard.biesheuvel@linaro.org
      Cc: james.morse@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: openrisc@lists.librecores.org
      Link: https://lkml.kernel.org/r/20180622170126.6308-6-palmer@sifive.com
      4f7799d9
  8. 02 8月, 2018 4 次提交
  9. 01 8月, 2018 2 次提交
  10. 31 7月, 2018 5 次提交
    • M
      arm64: perf: Add cap_user_time aarch64 · 9d2dcc8f
      Michael O'Farrell 提交于
      It is useful to get the running time of a thread.  Doing so in an
      efficient manner can be important for performance of user applications.
      Avoiding system calls in `clock_gettime` when handling
      CLOCK_THREAD_CPUTIME_ID is important.  Other clocks are handled in the
      VDSO, but CLOCK_THREAD_CPUTIME_ID falls back on the system call.
      
      CLOCK_THREAD_CPUTIME_ID is not handled in the VDSO since it would have
      costs associated with maintaining updated user space accessible time
      offsets.  These offsets have to be updated everytime the a thread is
      scheduled/descheduled.  However, for programs regularly checking the
      running time of a thread, this is a performance improvement.
      
      This patch takes a middle ground, and adds support for cap_user_time an
      optional feature of the perf_event API.  This way costs are only
      incurred when the perf_event api is enabled.  This is done the same way
      as it is in x86.
      
      Ultimately this allows calculating the thread running time in userspace
      on aarch64 as follows (adapted from perf_event_open manpage):
      
      u32 seq, time_mult, time_shift;
      u64 running, count, time_offset, quot, rem, delta;
      struct perf_event_mmap_page *pc;
      pc = buf;  // buf is the perf event mmaped page as documented in the API.
      
      if (pc->cap_usr_time) {
          do {
              seq = pc->lock;
              barrier();
              running = pc->time_running;
      
              count = readCNTVCT_EL0();  // Read ARM hardware clock.
              time_offset = pc->time_offset;
              time_mult   = pc->time_mult;
              time_shift  = pc->time_shift;
      
              barrier();
          } while (pc->lock != seq);
      
          quot = (count >> time_shift);
          rem = count & (((u64)1 << time_shift) - 1);
          delta = time_offset + quot * time_mult +
                  ((rem * time_mult) >> time_shift);
      
          running += delta;
          // running now has the current nanosecond level thread time.
      }
      
      Summary of changes in the patch:
      
      For aarch64 systems, make arch_perf_update_userpage update the timing
      information stored in the perf_event page.  Requiring the following
      calculations:
        - Calculate the appropriate time_mult, and time_shift factors to convert
          ticks to nano seconds for the current clock frequency.
        - Adjust the mult and shift factors to avoid shift factors of 32 bits.
          (possibly unnecessary)
        - The time_offset userspace should apply when doing calculations:
          negative the current sched time (now), because time_running and
          time_enabled fields of the perf_event page have just been updated.
      Toggle bits to appropriate values:
        - Enable cap_user_time
      Signed-off-by: NMichael O'Farrell <micpof@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9d2dcc8f
    • Y
      audit: fix potential null dereference 'context->module.name' · b305f7ed
      Yi Wang 提交于
      The variable 'context->module.name' may be null pointer when
      kmalloc return null, so it's better to check it before using
      to avoid null dereference.
      Another one more thing this patch does is using kstrdup instead
      of (kmalloc + strcpy), and signal a lost record via audit_log_lost.
      
      Cc: stable@vger.kernel.org # 4.11
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn>
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      b305f7ed
    • M
      cpu/hotplug: Clarify CPU hotplug step name for timers · d018031f
      Mukesh Ojha 提交于
      After commit 249d4a9b3246 ("timers: Reinitialize per cpu bases on hotplug")
      i.e. the introduction of state CPUHP_TIMERS_PREPARE instead of
      CPUHP_TIMERS_DEAD the step name "timers:dead" is not longer accurate.
      
      Rename it to "timers:prepare".
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NMukesh Ojha <mojha@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: gkohli@codeaurora.org
      Cc: neeraju@codeaurora.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Brendan Jackman <brendan.jackman@arm.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Link: https://lkml.kernel.org/r/1532443668-26810-1-git-send-email-mojha@codeaurora.org
      d018031f
    • P
      sched/clock: Disable interrupts when calling generic_sched_clock_init() · bd9f943e
      Pavel Tatashin 提交于
      sched_clock_init() used be called early during boot when interrupts were
      still disabled. After the recent changes to utilize sched clock early the
      sched_clock_init() call happens when interrupts are already enabled, which
      triggers the following warning:
      
      WARNING: CPU: 0 PID: 0 at kernel/time/sched_clock.c:180 sched_clock_register+0x44/0x278
      [<c001a13c>] (warn_slowpath_null) from [<c052367c>] (sched_clock_register+0x44/0x278)
      [<c052367c>] (sched_clock_register) from [<c05238d8>] (generic_sched_clock_init+0x28/0x88)
      [<c05238d8>] (generic_sched_clock_init) from [<c0521a00>] (sched_clock_init+0x54/0x74)
      [<c0521a00>] (sched_clock_init) from [<c0519c18>] (start_kernel+0x310/0x3e4)
      [<c0519c18>] (start_kernel) from [<00000000>] (  (null))
      
      Disable IRQs for the duration of generic_sched_clock_init().
      
      Fixes: 857baa87 ("sched/clock: Enable sched clock early")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Link: https://lkml.kernel.org/r/20180730135252.24599-1-pasha.tatashin@oracle.com
      bd9f943e
    • P
      timekeeping: Prevent false warning when persistent clock is not available · 684ad537
      Pavel Tatashin 提交于
      On arches with no persistent clock a message like this is printed during
      boot:
      
      [    0.000000] Persistent clock returned invalid value
      
      The value is not invalid: Zero means that no persistent clock is available
      and the absence of persistent clock should be quietly accepted.
      
      Fixes: 3eca9937 ("timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: sboyd@kernel.org
      Cc: john.stultz@linaro.org
      Link: https://lkml.kernel.org/r/20180725200018.23722-1-pasha.tatashin@oracle.com
      684ad537
  11. 30 7月, 2018 1 次提交
    • J
      Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables" · 0e664eee
      Joerg Roedel 提交于
      This reverts commit 77754cfa.
      
      The patch was necessary to silence a WARN_ON_ONCE(in_nmi())
      that triggered in the vmalloc_fault() function when PTI was
      enabled on x86-32.
      
      Faulting in an NMI handler turned out to be safe and the
      warning in vmalloc_fault() is gone now. So the above patch
      can be reverted.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NDavid H. Gutteridge <dhgutteridge@sympatico.ca>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1532533683-5988-3-git-send-email-joro@8bytes.org
      0e664eee
  12. 28 7月, 2018 1 次提交
    • R
      dma-mapping: Generalise dma_32bit_limit flag · f07d141f
      Robin Murphy 提交于
      Whilst the notion of an upstream DMA restriction is most commonly seen
      in PCI host bridges saddled with a 32-bit native interface, a more
      general version of the same issue can exist on complex SoCs where a bus
      or point-to-point interconnect link from a device's DMA master interface
      to another component along the path to memory (often an IOMMU) may carry
      fewer address bits than the interfaces at both ends nominally support.
      In order to properly deal with this, the first step is to expand the
      dma_32bit_limit flag into an arbitrary mask.
      
      To minimise the impact on existing code, we'll make sure to only
      consider this new mask valid if set. That makes sense anyway, since a
      mask of zero would represent DMA not being wired up at all, and that
      would be better handled by not providing valid ops in the first place.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f07d141f
  13. 27 7月, 2018 5 次提交
  14. 26 7月, 2018 4 次提交
    • S
      kthread, tracing: Don't expose half-written comm when creating kthreads · 3e536e22
      Snild Dolkow 提交于
      There is a window for racing when printing directly to task->comm,
      allowing other threads to see a non-terminated string. The vsnprintf
      function fills the buffer, counts the truncated chars, then finally
      writes the \0 at the end.
      
      	creator                     other
      	vsnprintf:
      	  fill (not terminated)
      	  count the rest            trace_sched_waking(p):
      	  ...                         memcpy(comm, p->comm, TASK_COMM_LEN)
      	  write \0
      
      The consequences depend on how 'other' uses the string. In our case,
      it was copied into the tracing system's saved cmdlines, a buffer of
      adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
      
      	crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
      	0xffffffd5b3818640:     "irq/497-pwr_evenkworker/u16:12"
      
      ...and a strcpy out of there would cause stack corruption:
      
      	[224761.522292] Kernel panic - not syncing: stack-protector:
      	    Kernel stack is corrupted in: ffffff9bf9783c78
      
      	crash-arm64> kbt | grep 'comm\|trace_print_context'
      	#6  0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
      	      comm (char [16]) =  "irq/497-pwr_even"
      
      	crash-arm64> rd 0xffffffd4d0e17d14 8
      	ffffffd4d0e17d14:  2f71726900000000 5f7277702d373934   ....irq/497-pwr_
      	ffffffd4d0e17d24:  726f776b6e657665 3a3631752f72656b   evenkworker/u16:
      	ffffffd4d0e17d34:  f9780248ff003231 cede60e0ffffff9b   12..H.x......`..
      	ffffffd4d0e17d44:  cede60c8ffffffd4 00000fffffffffd4   .....`..........
      
      The workaround in e09e2867 (use strlcpy in __trace_find_cmdline) was
      likely needed because of this same bug.
      
      Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
      This way, there won't be a window where comm is not terminated.
      
      Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com
      
      Cc: stable@vger.kernel.org
      Fixes: bc0c38d1 ("ftrace: latency tracer infrastructure")
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSnild Dolkow <snild@sony.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3e536e22
    • W
      cpu/hotplug: Add a cpus_read_trylock() function · 6f4ceee9
      Waiman Long 提交于
      There are use cases where it can be useful to have a cpus_read_trylock()
      function to work around circular lock dependency problem involving
      the cpu_hotplug_lock.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6f4ceee9
    • S
      tracing: Quiet gcc warning about maybe unused link variable · 2519c1bb
      Steven Rostedt (VMware) 提交于
      Commit 57ea2a34 ("tracing/kprobes: Fix trace_probe flags on
      enable_trace_kprobe() failure") added an if statement that depends on another
      if statement that gcc doesn't see will initialize the "link" variable and
      gives the warning:
      
       "warning: 'link' may be used uninitialized in this function"
      
      It is really a false positive, but to quiet the warning, and also to make
      sure that it never actually is used uninitialized, initialize the "link"
      variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
      thinks it could be used uninitialized.
      
      Cc: stable@vger.kernel.org
      Fixes: 57ea2a34 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2519c1bb
    • S
      tracing: Fix possible double free in event_enable_trigger_func() · 15cc7864
      Steven Rostedt (VMware) 提交于
      There was a case that triggered a double free in event_trigger_callback()
      due to the called reg() function freeing the trigger_data and then it
      getting freed again by the error return by the caller. The solution there
      was to up the trigger_data ref count.
      
      Code inspection found that event_enable_trigger_func() has the same issue,
      but is not as easy to trigger (requires harder to trigger failures). It
      needs to be solved slightly different as it needs more to clean up when the
      reg() function fails.
      
      Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 7862ad18 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
      Reivewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      15cc7864
  15. 25 7月, 2018 3 次提交
    • A
      tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure · 57ea2a34
      Artem Savkov 提交于
      If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
      it returns an error, but does not unset the tp flags it set previously.
      This results in a probe being considered enabled and failures like being
      unable to remove the probe through kprobe_events file since probes_open()
      expects every probe to be disabled.
      
      Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
      Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 41a7dd42 ("tracing/kprobes: Support ftrace_event_file base multibuffer")
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NArtem Savkov <asavkov@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      57ea2a34
    • M
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 73c8d894
      Masami Hiramatsu 提交于
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73c8d894
    • S
      tracing: Fix double free of event_trigger_data · 1863c387
      Steven Rostedt (VMware) 提交于
      Running the following:
      
       # cd /sys/kernel/debug/tracing
       # echo 500000 > buffer_size_kb
      [ Or some other number that takes up most of memory ]
       # echo snapshot > events/sched/sched_switch/trigger
      
      Triggers the following bug:
      
       ------------[ cut here ]------------
       kernel BUG at mm/slub.c:296!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       RIP: 0010:kfree+0x16c/0x180
       Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
       RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
       RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
       RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
       RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
       R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
       R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
       FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
       Call Trace:
        event_trigger_callback+0xee/0x1d0
        event_trigger_write+0xfc/0x1a0
        __vfs_write+0x33/0x190
        ? handle_mm_fault+0x115/0x230
        ? _cond_resched+0x16/0x40
        vfs_write+0xb0/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x5a/0x160
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f363e16ab50
       Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
       RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
       RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
       RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
       R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
       Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
      86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
       ---[ end trace d301afa879ddfa25 ]---
      
      The cause is because the register_snapshot_trigger() call failed to
      allocate the snapshot buffer, and then called unregister_trigger()
      which freed the data that was passed to it. Then on return to the
      function that called register_snapshot_trigger(), as it sees it
      failed to register, it frees the trigger_data again and causes
      a double free.
      
      By calling event_trigger_init() on the trigger_data (which only ups
      the reference counter for it), and then event_trigger_free() afterward,
      the trigger_data would not get freed by the registering trigger function
      as it would only up and lower the ref count for it. If the register
      trigger function fails, then the event_trigger_free() called after it
      will free the trigger data normally.
      
      Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
      
      Cc: stable@vger.kerne.org
      Fixes: 93e31ffb ("tracing: Add 'snapshot' event trigger command")
      Reported-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1863c387