1. 28 12月, 2017 2 次提交
    • S
      ring-buffer: Do no reuse reader page if still in use · ae415fa4
      Steven Rostedt (VMware) 提交于
      To free the reader page that is allocated with ring_buffer_alloc_read_page(),
      ring_buffer_free_read_page() must be called. For faster performance, this
      page can be reused by the ring buffer to avoid having to free and allocate
      new pages.
      
      The issue arises when the page is used with a splice pipe into the
      networking code. The networking code may up the page counter for the page,
      and keep it active while sending it is queued to go to the network. The
      incrementing of the page ref does not prevent it from being reused in the
      ring buffer, and this can cause the page that is being sent out to the
      network to be modified before it is sent by reading new data.
      
      Add a check to the page ref counter, and only reuse the page if it is not
      being used anywhere else.
      
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      ae415fa4
    • S
      ring-buffer: Mask out the info bits when returning buffer page length · 45d8b80c
      Steven Rostedt (VMware) 提交于
      Two info bits were added to the "commit" part of the ring buffer data page
      when returned to be consumed. This was to inform the user space readers that
      events have been missed, and that the count may be stored at the end of the
      page.
      
      What wasn't handled, was the splice code that actually called a function to
      return the length of the data in order to zero out the rest of the page
      before sending it up to user space. These data bits were returned with the
      length making the value negative, and that negative value was not checked.
      It was compared to PAGE_SIZE, and only used if the size was less than
      PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
      unsigned compare, meaning the negative size value did not end up causing a
      large portion of memory to be randomly zeroed out.
      
      Cc: stable@vger.kernel.org
      Fixes: 66a8cb95 ("ring-buffer: Add place holder recording of dropped events")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      45d8b80c
  2. 04 12月, 2017 1 次提交
  3. 16 11月, 2017 1 次提交
  4. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  5. 04 10月, 2017 1 次提交
    • S
      ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler · 1a149d7d
      Steven Rostedt (VMware) 提交于
      The current method to prevent the ring buffer from entering into a recursize
      loop is to use a bitmask and set the bit that maps to the current context
      (normal, softirq, irq or NMI), and if that bit was already set, it is
      considered a recursive loop.
      
      New code is being added that may require the ring buffer to be entered a
      second time in the current context. The recursive locking prevents that from
      happening. Instead of mapping a bitmask to the current context, just allow 4
      levels of nesting in the ring buffer. This matches the 4 context levels that
      it can already nest. It is highly unlikely to have more than two levels,
      thus it should be fine when we add the second entry into the ring buffer. If
      that proves to be a problem, we can always up the number to 8.
      
      An added benefit is that reading preempt_count() to get the current level
      adds a very slight but noticeable overhead. This removes that need.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1a149d7d
  6. 03 8月, 2017 1 次提交
    • S
      ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU · a7e52ad7
      Steven Rostedt (VMware) 提交于
      Chunyu Hu reported:
        "per_cpu trace directories and files are created for all possible cpus,
         but only the cpus which have ever been on-lined have their own per cpu
         ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
         open handler for trace file 'trace_pipe_raw' is always trying to access
         field of ring_buffer_per_cpu, and would panic with the NULL pointer.
      
         Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
         when openning it if that cpu does not have trace ring buffer.
      
         Reproduce:
         cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
         (cpu31 is never on-lined, this is a 16 cores x86_64 box)
      
         Tested with:
         1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
            Got -NODEV.
         2) oneline cpu15, read trace_pipe_raw of cpu15.
            Get the raw trace data.
      
         Call trace:
         [ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
         [ 5760.961678]  tracing_buffers_read+0x1f6/0x230
         [ 5760.962695]  __vfs_read+0x37/0x160
         [ 5760.963498]  ? __vfs_read+0x5/0x160
         [ 5760.964339]  ? security_file_permission+0x9d/0xc0
         [ 5760.965451]  ? __vfs_read+0x5/0x160
         [ 5760.966280]  vfs_read+0x8c/0x130
         [ 5760.967070]  SyS_read+0x55/0xc0
         [ 5760.967779]  do_syscall_64+0x67/0x150
         [ 5760.968687]  entry_SYSCALL64_slow_path+0x25/0x25"
      
      This was introduced by the addition of the feature to reuse reader pages
      instead of re-allocating them. The problem is that the allocation of a
      reader page (which is per cpu) does not check if the cpu is online and set
      up for the ring buffer.
      
      Link: http://lkml.kernel.org/r/1500880866-1177-1-git-send-email-chuhu@redhat.com
      
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Reported-by: NChunyu Hu <chuhu@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a7e52ad7
  7. 19 7月, 2017 1 次提交
  8. 01 5月, 2017 1 次提交
    • S
      ring-buffer: Return reader page back into existing ring buffer · 73a757e6
      Steven Rostedt (VMware) 提交于
      When reading the ring buffer for consuming, it is optimized for splice,
      where a page is taken out of the ring buffer (zero copy) and sent to the
      reading consumer. When the read is finished with the page, it calls
      ring_buffer_free_read_page(), which simply frees the page. The next time the
      reader needs to get a page from the ring buffer, it must call
      ring_buffer_alloc_read_page() which allocates and initializes a reader page
      for the ring buffer to be swapped into the ring buffer for a new filled page
      for the reader.
      
      The problem is that there's no reason to actually free the page when it is
      passed back to the ring buffer. It can hold it off and reuse it for the next
      iteration. This completely removes the interaction with the page_alloc
      mechanism.
      
      Using the trace-cmd utility to record all events (causing trace-cmd to
      require reading lots of pages from the ring buffer, and calling
      ring_buffer_alloc/free_read_page() several times), and also assigning a
      stack trace trigger to the mm_page_alloc event, we can see how many times
      the ring_buffer_alloc_read_page() needed to allocate a page for the ring
      buffer.
      
      Before this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        9968
      
      After this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        4
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73a757e6
  9. 20 4月, 2017 1 次提交
    • S
      ring-buffer: Have ring_buffer_iter_empty() return true when empty · 78f7a45d
      Steven Rostedt (VMware) 提交于
      I noticed that reading the snapshot file when it is empty no longer gives a
      status. It suppose to show the status of the snapshot buffer as well as how
      to allocate and use it. For example:
      
       ># cat snapshot
       # tracer: nop
       #
       #
       # * Snapshot is allocated *
       #
       # Snapshot commands:
       # echo 0 > snapshot : Clears and frees snapshot buffer
       # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
       #                      Takes a snapshot of the main buffer.
       # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
       #                      (Doesn't have to be '2' works with any number that
       #                       is not a '0' or '1')
      
      But instead it just showed an empty buffer:
      
       ># cat snapshot
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 0/0   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
      
      What happened was that it was using the ring_buffer_iter_empty() function to
      see if it was empty, and if it was, it showed the status. But that function
      was returning false when it was empty. The reason was that the iter header
      page was on the reader page, and the reader page was empty, but so was the
      buffer itself. The check only tested to see if the iter was on the commit
      page, but the commit page was no longer pointing to the reader page, but as
      all pages were empty, the buffer is also.
      
      Cc: stable@vger.kernel.org
      Fixes: 651e22f2 ("ring-buffer: Always reset iterator to reader page")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      78f7a45d
  10. 05 4月, 2017 1 次提交
  11. 02 3月, 2017 1 次提交
  12. 13 12月, 2016 1 次提交
  13. 07 12月, 2016 1 次提交
  14. 02 12月, 2016 1 次提交
  15. 24 11月, 2016 5 次提交
  16. 14 5月, 2016 1 次提交
    • S
      ring-buffer: Prevent overflow of size in ring_buffer_resize() · 59643d15
      Steven Rostedt (Red Hat) 提交于
      If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
      then the DIV_ROUND_UP() will return zero.
      
      Here's the details:
      
        # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb
      
      tracing_entries_write() processes this and converts kb to bytes.
      
       18014398509481980 << 10 = 18446744073709547520
      
      and this is passed to ring_buffer_resize() as unsigned long size.
      
       size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
      
      Where DIV_ROUND_UP(a, b) is (a + b - 1)/b
      
      BUF_PAGE_SIZE is 4080 and here
      
       18446744073709547520 + 4080 - 1 = 18446744073709551599
      
      where 18446744073709551599 is still smaller than 2^64
      
       2^64 - 18446744073709551599 = 17
      
      But now 18446744073709551599 / 4080 = 4521260802379792
      
      and size = size * 4080 = 18446744073709551360
      
      This is checked to make sure its still greater than 2 * 4080,
      which it is.
      
      Then we convert to the number of buffer pages needed.
      
       nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)
      
      but this time size is 18446744073709551360 and
      
       2^64 - (18446744073709551360 + 4080 - 1) = -3823
      
      Thus it overflows and the resulting number is less than 4080, which makes
      
        3823 / 4080 = 0
      
      an nr_pages is set to this. As we already checked against the minimum that
      nr_pages may be, this causes the logic to fail as well, and we crash the
      kernel.
      
      There's no reason to have the two DIV_ROUND_UP() (that's just result of
      historical code changes), clean up the code and fix this bug.
      
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      59643d15
  17. 13 5月, 2016 1 次提交
    • S
      ring-buffer: Use long for nr_pages to avoid overflow failures · 9b94a8fb
      Steven Rostedt (Red Hat) 提交于
      The size variable to change the ring buffer in ftrace is a long. The
      nr_pages used to update the ring buffer based on the size is int. On 64 bit
      machines this can cause an overflow problem.
      
      For example, the following will cause the ring buffer to crash:
      
       # cd /sys/kernel/debug/tracing
       # echo 10 > buffer_size_kb
       # echo 8556384240 > buffer_size_kb
      
      Then you get the warning of:
      
       WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
      
      Which is:
      
        RB_WARN_ON(cpu_buffer, nr_removed);
      
      Note each ring buffer page holds 4080 bytes.
      
      This is because:
      
       1) 10 causes the ring buffer to have 3 pages.
          (10kb requires 3 * 4080 pages to hold)
      
       2) (2^31 / 2^10  + 1) * 4080 = 8556384240
          The value written into buffer_size_kb is shifted by 10 and then passed
          to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
      
       3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
          which is 4080. 8761737461760 / 4080 = 2147484672
      
       4) nr_pages is subtracted from the current nr_pages (3) and we get:
          2147484669. This value is saved in a signed integer nr_pages_to_update
      
       5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
          turns into the value of -2147482627
      
       6) As the value is a negative number, in update_pages_handler() it is
          negated and passed to rb_remove_pages() and 2147482627 pages will
          be removed, which is much larger than 3 and it causes the warning
          because not all the pages asked to be removed were removed.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
      
      Cc: stable@vger.kernel.org # 2.6.28+
      Fixes: 7a8e76a3 ("tracing: unified trace buffer")
      Reported-by: NHao Qin <QEver.cn@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9b94a8fb
  18. 26 11月, 2015 1 次提交
  19. 24 11月, 2015 4 次提交
    • S
      ring-buffer: Remove redundant update of page timestamp · 70004986
      Steven Rostedt (Red Hat) 提交于
      The first commit of a buffer page updates the timestamp of that page. No
      need to have the update to the next page add the timestamp too. It will only
      be replaced by the first commit on that page anyway.
      
      Only update to a page if it contains an event.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      70004986
    • S
      ring-buffer: Use READ_ONCE() for most tail_page access · 8573636e
      Steven Rostedt (Red Hat) 提交于
      As cpu_buffer->tail_page may be modified by interrupts at almost any time,
      the flow of logic is very important. Do not let gcc get smart with
      re-reading cpu_buffer->tail_page by adding READ_ONCE() around most of its
      accesses.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8573636e
    • S
      ring-buffer: Put back the length if crossed page with add_timestamp · bd1b7cd3
      Steven Rostedt (Red Hat) 提交于
      Commit fcc742ea "ring-buffer: Add event descriptor to simplify passing
      data" added a descriptor that holds various data instead of passing around
      several variables through parameters. The problem was that one of the
      parameters was modified in a function and the code was designed not to have
      an effect on that modified  parameter. Now that the parameter is a
      descriptor and any modifications to it are non-volatile, the size of the
      data could be unnecessarily expanded.
      
      Remove the extra space added if a timestamp was added and the event went
      across the page.
      
      Cc: stable@vger.kernel.org # 4.3+
      Fixes: fcc742ea "ring-buffer: Add event descriptor to simplify passing data"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd1b7cd3
    • S
      ring-buffer: Update read stamp with first real commit on page · b81f472a
      Steven Rostedt (Red Hat) 提交于
      Do not update the read stamp after swapping out the reader page from the
      write buffer. If the reader page is swapped out of the buffer before an
      event is written to it, then the read_stamp may get an out of date
      timestamp, as the page timestamp is updated on the first commit to that
      page.
      
      rb_get_reader_page() only returns a page if it has an event on it, otherwise
      it will return NULL. At that point, check if the page being returned has
      events and has not been read yet. Then at that point update the read_stamp
      to match the time stamp of the reader page.
      
      Cc: stable@vger.kernel.org # 2.6.30+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b81f472a
  20. 03 11月, 2015 4 次提交
  21. 03 9月, 2015 1 次提交
    • S
      ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated" · b7dc42fd
      Steven Rostedt (Red Hat) 提交于
      The commit a4543a2f "ring-buffer: Get timestamp after event is
      allocated" is needed for some future work. But after adding it, there is a
      race somewhere that causes the saved timestamp to have a slight shift, and
      get ahead of the actual timestamp and make it look like time goes backwards.
      
      I'm still looking into why this happens, but in the mean time, this is
      holding up other work to get in. I'm reverting the change for now (which
      makes the problem go away), and will add it back after I know what is wrong
      and fix it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b7dc42fd
  22. 21 7月, 2015 5 次提交
    • S
      ring-buffer: Reorganize function locations · d90fd774
      Steven Rostedt (Red Hat) 提交于
      Functions in ring-buffer.c have gotten interleaved between different
      use cases. Move the functions around to get like functions closer
      together. This may or may not help gcc keep cache locality, but it
      makes it a little easier to work with the code.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d90fd774
    • S
      ring-buffer: Make sure event has enough room for extend and padding · 7d75e683
      Steven Rostedt (Red Hat) 提交于
      Now that events only add time extends after it is committed, in case
      an event comes in before it can discard the allocated event, the time
      extend needs to be stored within the event. If the event is bigger
      than then size needed for the time extend, padding must be added.
      The minimum padding size is 8 bytes. Thus if the event is 12 bytes
      (size of time extend + 4), there will not be enough room to add both
      the time extend and padding. Make sure all events are either 8 bytes
      or 16 or more bytes.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7d75e683
    • S
      ring-buffer: Get timestamp after event is allocated · a4543a2f
      Steven Rostedt (Red Hat) 提交于
      Move the capturing of the timestamp to after an event is allocated.
      If the event is not a commit (where it is an event that preempted
      another event), then no timestamp is needed, because the delta of
      nested events is always zero.
      
      If the event starts on a new page, no delta needs to be calculated
      as the full timestamp will be added to the page header, and the
      event will have a delta of zero.
      
      Now if the event requires a time extend (the delta does not fit
      in the 27 bit delta slot in the header), then the event is discarded,
      the length is extended to hold the TIME_EXTEND event that allows for
      a 59 bit delta, and the commit is tried again.
      
      If the event can't be discarded (another event came in after it),
      then the TIME_EXTEND is added directly to the allocated event and
      the rest of the event is given padding.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a4543a2f
    • S
      ring-buffer: Move the adding of the extended timestamp out of line · 9826b273
      Steven Rostedt (Red Hat) 提交于
      Requiring a extended time stamp is an uncommon occurrence, and it is
      best to do it out of line when needed.
      
      Add a noinline function that handles the extended timestamp and
      have it called with an unlikely to completely move it out of the
      fast path.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9826b273
    • S
      ring-buffer: Add event descriptor to simplify passing data · fcc742ea
      Steven Rostedt (Red Hat) 提交于
      Add rb_event_info descriptor to pass event info to functions a bit
      easier than using a bunch of parameters. This will also allow for
      changing the code around a bit to find better fast paths.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fcc742ea
  23. 29 5月, 2015 3 次提交
    • S
      ring-buffer: Add enum names for the context levels · a497adb4
      Steven Rostedt (Red Hat) 提交于
      Instead of having hard coded numbers for the context levels, use
      enums to describe them more.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a497adb4
    • S
      ring-buffer: Remove useless unused tracing_off_permanent() · 3c6296f7
      Steven Rostedt (Red Hat) 提交于
      The tracing_off_permanent() call is a way to disable all ring_buffers.
      Nothing uses it and nothing should use it, as tracing_off() and
      friends are better, as they disable the ring buffers related to
      tracing. The tracing_off_permanent() even disabled non tracing
      ring buffers. This is a bit drastic, and was added to handle NMIs
      doing outputs that could corrupt the ring buffer when only tracing
      used them. It is now obsolete and adds a little overhead, it should
      be removed.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3c6296f7
    • S
      ring-buffer: Give NMIs a chance to lock the reader_lock · 289a5a25
      Steven Rostedt (Red Hat) 提交于
      Currently, if an NMI does a dump of a ring buffer, it disables
      all ring buffers from ever doing any writes again. This is because
      it wont take the locks for the cpu_buffer and this can cause
      corruption if it preempted a read, or a read happens on another
      CPU for the current cpu buffer. This is a bit overkill.
      
      First, it should at least try to take the lock, and if it fails
      then disable it. Also, there's no need to disable all ring
      buffers, even those that are unrelated to what is being read.
      Only disable the per cpu ring buffer that is being read if
      it can not get the lock for it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      289a5a25