1. 02 8月, 2018 2 次提交
  2. 25 7月, 2018 1 次提交
    • M
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 73c8d894
      Masami Hiramatsu 提交于
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73c8d894
  3. 05 6月, 2018 1 次提交
  4. 06 4月, 2018 3 次提交
  5. 11 3月, 2018 3 次提交
  6. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  7. 19 1月, 2018 1 次提交
    • S
      ring-buffer: Fix duplicate results in mapping context to bits in recursive lock · 0164e0d7
      Steven Rostedt (VMware) 提交于
      In bringing back the context checks, the code checks first if its normal
      (non-interrupt) context, and then for NMI then IRQ then softirq. The final
      check is redundant. Since the if branch is only hit if the context is one of
      NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if
      it is SOFTIRQ. The current code returns the same result even if its not a
      SOFTIRQ. Which is confusing.
      
        pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ
      
      Is redundant as RB_CTX_SOFTIRQ *is* 2!
      
      Fixes: a0e3a18f ("ring-buffer: Bring back context level recursive checks")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0164e0d7
  8. 16 1月, 2018 1 次提交
    • S
      ring-buffer: Bring back context level recursive checks · a0e3a18f
      Steven Rostedt (VMware) 提交于
      Commit 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
      simpler") replaced the context level recursion checks with a simple counter.
      This would prevent the ring buffer code from recursively calling itself more
      than the max number of contexts that exist (Normal, softirq, irq, nmi). But
      this change caused a lockup in a specific case, which was during suspend and
      resume using a global clock. Adding a stack dump to see where this occurred,
      the issue was in the trace global clock itself:
      
        trace_buffer_lock_reserve+0x1c/0x50
        __trace_graph_entry+0x2d/0x90
        trace_graph_entry+0xe8/0x200
        prepare_ftrace_return+0x69/0xc0
        ftrace_graph_caller+0x78/0xa8
        queued_spin_lock_slowpath+0x5/0x1d0
        trace_clock_global+0xb0/0xc0
        ring_buffer_lock_reserve+0xf9/0x390
      
      The function graph tracer traced queued_spin_lock_slowpath that was called
      by trace_clock_global. This pointed out that the trace_clock_global() is not
      reentrant, as it takes a spin lock. It depended on the ring buffer recursive
      lock from letting that happen.
      
      By removing the context detection and adding just a max number of allowable
      recursions, it allowed the trace_clock_global() to be entered again and try
      to retake the spinlock it already held, causing a deadlock.
      
      Fixes: 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
      Reported-by: NDavid Weinehall <david.weinehall@gmail.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a0e3a18f
  9. 28 12月, 2017 2 次提交
    • S
      ring-buffer: Do no reuse reader page if still in use · ae415fa4
      Steven Rostedt (VMware) 提交于
      To free the reader page that is allocated with ring_buffer_alloc_read_page(),
      ring_buffer_free_read_page() must be called. For faster performance, this
      page can be reused by the ring buffer to avoid having to free and allocate
      new pages.
      
      The issue arises when the page is used with a splice pipe into the
      networking code. The networking code may up the page counter for the page,
      and keep it active while sending it is queued to go to the network. The
      incrementing of the page ref does not prevent it from being reused in the
      ring buffer, and this can cause the page that is being sent out to the
      network to be modified before it is sent by reading new data.
      
      Add a check to the page ref counter, and only reuse the page if it is not
      being used anywhere else.
      
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ae415fa4
    • S
      ring-buffer: Mask out the info bits when returning buffer page length · 45d8b80c
      Steven Rostedt (VMware) 提交于
      Two info bits were added to the "commit" part of the ring buffer data page
      when returned to be consumed. This was to inform the user space readers that
      events have been missed, and that the count may be stored at the end of the
      page.
      
      What wasn't handled, was the splice code that actually called a function to
      return the length of the data in order to zero out the rest of the page
      before sending it up to user space. These data bits were returned with the
      length making the value negative, and that negative value was not checked.
      It was compared to PAGE_SIZE, and only used if the size was less than
      PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
      unsigned compare, meaning the negative size value did not end up causing a
      large portion of memory to be randomly zeroed out.
      
      Cc: stable@vger.kernel.org
      Fixes: 66a8cb95 ("ring-buffer: Add place holder recording of dropped events")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      45d8b80c
  10. 04 12月, 2017 1 次提交
  11. 29 11月, 2017 1 次提交
  12. 16 11月, 2017 1 次提交
  13. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  14. 04 10月, 2017 1 次提交
    • S
      ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler · 1a149d7d
      Steven Rostedt (VMware) 提交于
      The current method to prevent the ring buffer from entering into a recursize
      loop is to use a bitmask and set the bit that maps to the current context
      (normal, softirq, irq or NMI), and if that bit was already set, it is
      considered a recursive loop.
      
      New code is being added that may require the ring buffer to be entered a
      second time in the current context. The recursive locking prevents that from
      happening. Instead of mapping a bitmask to the current context, just allow 4
      levels of nesting in the ring buffer. This matches the 4 context levels that
      it can already nest. It is highly unlikely to have more than two levels,
      thus it should be fine when we add the second entry into the ring buffer. If
      that proves to be a problem, we can always up the number to 8.
      
      An added benefit is that reading preempt_count() to get the current level
      adds a very slight but noticeable overhead. This removes that need.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1a149d7d
  15. 03 8月, 2017 1 次提交
    • S
      ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU · a7e52ad7
      Steven Rostedt (VMware) 提交于
      Chunyu Hu reported:
        "per_cpu trace directories and files are created for all possible cpus,
         but only the cpus which have ever been on-lined have their own per cpu
         ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
         open handler for trace file 'trace_pipe_raw' is always trying to access
         field of ring_buffer_per_cpu, and would panic with the NULL pointer.
      
         Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
         when openning it if that cpu does not have trace ring buffer.
      
         Reproduce:
         cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
         (cpu31 is never on-lined, this is a 16 cores x86_64 box)
      
         Tested with:
         1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
            Got -NODEV.
         2) oneline cpu15, read trace_pipe_raw of cpu15.
            Get the raw trace data.
      
         Call trace:
         [ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
         [ 5760.961678]  tracing_buffers_read+0x1f6/0x230
         [ 5760.962695]  __vfs_read+0x37/0x160
         [ 5760.963498]  ? __vfs_read+0x5/0x160
         [ 5760.964339]  ? security_file_permission+0x9d/0xc0
         [ 5760.965451]  ? __vfs_read+0x5/0x160
         [ 5760.966280]  vfs_read+0x8c/0x130
         [ 5760.967070]  SyS_read+0x55/0xc0
         [ 5760.967779]  do_syscall_64+0x67/0x150
         [ 5760.968687]  entry_SYSCALL64_slow_path+0x25/0x25"
      
      This was introduced by the addition of the feature to reuse reader pages
      instead of re-allocating them. The problem is that the allocation of a
      reader page (which is per cpu) does not check if the cpu is online and set
      up for the ring buffer.
      
      Link: http://lkml.kernel.org/r/1500880866-1177-1-git-send-email-chuhu@redhat.com
      
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Reported-by: NChunyu Hu <chuhu@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a7e52ad7
  16. 19 7月, 2017 1 次提交
  17. 01 5月, 2017 1 次提交
    • S
      ring-buffer: Return reader page back into existing ring buffer · 73a757e6
      Steven Rostedt (VMware) 提交于
      When reading the ring buffer for consuming, it is optimized for splice,
      where a page is taken out of the ring buffer (zero copy) and sent to the
      reading consumer. When the read is finished with the page, it calls
      ring_buffer_free_read_page(), which simply frees the page. The next time the
      reader needs to get a page from the ring buffer, it must call
      ring_buffer_alloc_read_page() which allocates and initializes a reader page
      for the ring buffer to be swapped into the ring buffer for a new filled page
      for the reader.
      
      The problem is that there's no reason to actually free the page when it is
      passed back to the ring buffer. It can hold it off and reuse it for the next
      iteration. This completely removes the interaction with the page_alloc
      mechanism.
      
      Using the trace-cmd utility to record all events (causing trace-cmd to
      require reading lots of pages from the ring buffer, and calling
      ring_buffer_alloc/free_read_page() several times), and also assigning a
      stack trace trigger to the mm_page_alloc event, we can see how many times
      the ring_buffer_alloc_read_page() needed to allocate a page for the ring
      buffer.
      
      Before this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        9968
      
      After this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        4
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73a757e6
  18. 20 4月, 2017 1 次提交
    • S
      ring-buffer: Have ring_buffer_iter_empty() return true when empty · 78f7a45d
      Steven Rostedt (VMware) 提交于
      I noticed that reading the snapshot file when it is empty no longer gives a
      status. It suppose to show the status of the snapshot buffer as well as how
      to allocate and use it. For example:
      
       ># cat snapshot
       # tracer: nop
       #
       #
       # * Snapshot is allocated *
       #
       # Snapshot commands:
       # echo 0 > snapshot : Clears and frees snapshot buffer
       # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
       #                      Takes a snapshot of the main buffer.
       # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
       #                      (Doesn't have to be '2' works with any number that
       #                       is not a '0' or '1')
      
      But instead it just showed an empty buffer:
      
       ># cat snapshot
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 0/0   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
      
      What happened was that it was using the ring_buffer_iter_empty() function to
      see if it was empty, and if it was, it showed the status. But that function
      was returning false when it was empty. The reason was that the iter header
      page was on the reader page, and the reader page was empty, but so was the
      buffer itself. The check only tested to see if the iter was on the commit
      page, but the commit page was no longer pointing to the reader page, but as
      all pages were empty, the buffer is also.
      
      Cc: stable@vger.kernel.org
      Fixes: 651e22f2 ("ring-buffer: Always reset iterator to reader page")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      78f7a45d
  19. 05 4月, 2017 1 次提交
  20. 02 3月, 2017 1 次提交
  21. 13 12月, 2016 1 次提交
  22. 07 12月, 2016 1 次提交
  23. 02 12月, 2016 1 次提交
  24. 24 11月, 2016 5 次提交
  25. 14 5月, 2016 1 次提交
    • S
      ring-buffer: Prevent overflow of size in ring_buffer_resize() · 59643d15
      Steven Rostedt (Red Hat) 提交于
      If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
      then the DIV_ROUND_UP() will return zero.
      
      Here's the details:
      
        # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb
      
      tracing_entries_write() processes this and converts kb to bytes.
      
       18014398509481980 << 10 = 18446744073709547520
      
      and this is passed to ring_buffer_resize() as unsigned long size.
      
       size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
      
      Where DIV_ROUND_UP(a, b) is (a + b - 1)/b
      
      BUF_PAGE_SIZE is 4080 and here
      
       18446744073709547520 + 4080 - 1 = 18446744073709551599
      
      where 18446744073709551599 is still smaller than 2^64
      
       2^64 - 18446744073709551599 = 17
      
      But now 18446744073709551599 / 4080 = 4521260802379792
      
      and size = size * 4080 = 18446744073709551360
      
      This is checked to make sure its still greater than 2 * 4080,
      which it is.
      
      Then we convert to the number of buffer pages needed.
      
       nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)
      
      but this time size is 18446744073709551360 and
      
       2^64 - (18446744073709551360 + 4080 - 1) = -3823
      
      Thus it overflows and the resulting number is less than 4080, which makes
      
        3823 / 4080 = 0
      
      an nr_pages is set to this. As we already checked against the minimum that
      nr_pages may be, this causes the logic to fail as well, and we crash the
      kernel.
      
      There's no reason to have the two DIV_ROUND_UP() (that's just result of
      historical code changes), clean up the code and fix this bug.
      
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      59643d15
  26. 13 5月, 2016 1 次提交
    • S
      ring-buffer: Use long for nr_pages to avoid overflow failures · 9b94a8fb
      Steven Rostedt (Red Hat) 提交于
      The size variable to change the ring buffer in ftrace is a long. The
      nr_pages used to update the ring buffer based on the size is int. On 64 bit
      machines this can cause an overflow problem.
      
      For example, the following will cause the ring buffer to crash:
      
       # cd /sys/kernel/debug/tracing
       # echo 10 > buffer_size_kb
       # echo 8556384240 > buffer_size_kb
      
      Then you get the warning of:
      
       WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
      
      Which is:
      
        RB_WARN_ON(cpu_buffer, nr_removed);
      
      Note each ring buffer page holds 4080 bytes.
      
      This is because:
      
       1) 10 causes the ring buffer to have 3 pages.
          (10kb requires 3 * 4080 pages to hold)
      
       2) (2^31 / 2^10  + 1) * 4080 = 8556384240
          The value written into buffer_size_kb is shifted by 10 and then passed
          to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
      
       3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
          which is 4080. 8761737461760 / 4080 = 2147484672
      
       4) nr_pages is subtracted from the current nr_pages (3) and we get:
          2147484669. This value is saved in a signed integer nr_pages_to_update
      
       5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
          turns into the value of -2147482627
      
       6) As the value is a negative number, in update_pages_handler() it is
          negated and passed to rb_remove_pages() and 2147482627 pages will
          be removed, which is much larger than 3 and it causes the warning
          because not all the pages asked to be removed were removed.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
      
      Cc: stable@vger.kernel.org # 2.6.28+
      Fixes: 7a8e76a3 ("tracing: unified trace buffer")
      Reported-by: NHao Qin <QEver.cn@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9b94a8fb
  27. 26 11月, 2015 1 次提交
  28. 24 11月, 2015 3 次提交