提交 · ae415fa4c5248a8cf4faabd5a3c20576cb1ad607 · OpenHarmony / kernel_linux

28 12月, 2017 2 次提交

ring-buffer: Do no reuse reader page if still in use · ae415fa4

由 Steven Rostedt (VMware) 提交于 12月 22, 2017

To free the reader page that is allocated with ring_buffer_alloc_read_page(),
ring_buffer_free_read_page() must be called. For faster performance, this
page can be reused by the ring buffer to avoid having to free and allocate
new pages.

The issue arises when the page is used with a splice pipe into the
networking code. The networking code may up the page counter for the page,
and keep it active while sending it is queued to go to the network. The
incrementing of the page ref does not prevent it from being reused in the
ring buffer, and this can cause the page that is being sent out to the
network to be modified before it is sent by reading new data.

Add a check to the page ref counter, and only reuse the page if it is not
being used anywhere else.

Cc: stable@vger.kernel.org
Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

ae415fa4

ring-buffer: Mask out the info bits when returning buffer page length · 45d8b80c

由 Steven Rostedt (VMware) 提交于 12月 22, 2017

Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.

What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.

Cc: stable@vger.kernel.org
Fixes: 66a8cb95 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

45d8b80c

04 12月, 2017 1 次提交

ring-buffer: Remove unused function __rb_data_page_index() · c4bfd39d

由 Matthias Kaehlcke 提交于 5月 17, 2017

This fixes the following warning when building with clang:

kernel/trace/ring_buffer.c:1842:1: error: unused function
    '__rb_data_page_index' [-Werror,-Wunused-function]

Link: http://lkml.kernel.org/r/20170518001415.5223-1-mka@chromium.orgReviewed-by: NDouglas Anderson <dianders@chromium.org>
Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

c4bfd39d

16 11月, 2017 1 次提交

kmemcheck: remove annotations · 49502766

由 Levin, Alexander (Sasha Levin) 提交于 11月 15, 2017

Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow).  KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
  Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.comSigned-off-by: NSasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49502766

25 10月, 2017 1 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

04 10月, 2017 1 次提交

ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler · 1a149d7d

由 Steven Rostedt (VMware) 提交于 9月 22, 2017

The current method to prevent the ring buffer from entering into a recursize
loop is to use a bitmask and set the bit that maps to the current context
(normal, softirq, irq or NMI), and if that bit was already set, it is
considered a recursive loop.

New code is being added that may require the ring buffer to be entered a
second time in the current context. The recursive locking prevents that from
happening. Instead of mapping a bitmask to the current context, just allow 4
levels of nesting in the ring buffer. This matches the 4 context levels that
it can already nest. It is highly unlikely to have more than two levels,
thus it should be fine when we add the second entry into the ring buffer. If
that proves to be a problem, we can always up the number to 8.

An added benefit is that reading preempt_count() to get the current level
adds a very slight but noticeable overhead. This removes that need.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

1a149d7d

03 8月, 2017 1 次提交

ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU · a7e52ad7

由 Steven Rostedt (VMware) 提交于 8月 02, 2017

Chunyu Hu reported:
  "per_cpu trace directories and files are created for all possible cpus,
   but only the cpus which have ever been on-lined have their own per cpu
   ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
   open handler for trace file 'trace_pipe_raw' is always trying to access
   field of ring_buffer_per_cpu, and would panic with the NULL pointer.

   Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
   when openning it if that cpu does not have trace ring buffer.

   Reproduce:
   cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
   (cpu31 is never on-lined, this is a 16 cores x86_64 box)

   Tested with:
   1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
      Got -NODEV.
   2) oneline cpu15, read trace_pipe_raw of cpu15.
      Get the raw trace data.

   Call trace:
   [ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
   [ 5760.961678]  tracing_buffers_read+0x1f6/0x230
   [ 5760.962695]  __vfs_read+0x37/0x160
   [ 5760.963498]  ? __vfs_read+0x5/0x160
   [ 5760.964339]  ? security_file_permission+0x9d/0xc0
   [ 5760.965451]  ? __vfs_read+0x5/0x160
   [ 5760.966280]  vfs_read+0x8c/0x130
   [ 5760.967070]  SyS_read+0x55/0xc0
   [ 5760.967779]  do_syscall_64+0x67/0x150
   [ 5760.968687]  entry_SYSCALL64_slow_path+0x25/0x25"

This was introduced by the addition of the feature to reuse reader pages
instead of re-allocating them. The problem is that the allocation of a
reader page (which is per cpu) does not check if the cpu is online and set
up for the ring buffer.

Link: http://lkml.kernel.org/r/1500880866-1177-1-git-send-email-chuhu@redhat.com

Cc: stable@vger.kernel.org
Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
Reported-by: NChunyu Hu <chuhu@redhat.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

a7e52ad7

19 7月, 2017 1 次提交

tracing/ring_buffer: Try harder to allocate · 84861885

由 Joel Fernandes 提交于 7月 12, 2017

ftrace can fail to allocate per-CPU ring buffer on systems with a large
number of CPUs coupled while large amounts of cache happening in the
page cache. Currently the ring buffer allocation doesn't retry in the VM
implementation even if direct-reclaim made some progress but still
wasn't able to find a free page. On retrying I see that the allocations
almost always succeed. The retry doesn't happen because __GFP_NORETRY is
used in the tracer to prevent the case where we might OOM, however if we
drop __GFP_NORETRY, we risk destabilizing the system if OOM killer is
triggered. To prevent this situation, use the __GFP_RETRY_MAYFAIL flag
introduced recently [1].

Tested the following still succeeds without destabilizing a system with
1GB memory.
echo 300000 > /sys/kernel/debug/tracing/buffer_size_kb

[1] https://marc.info/?l=linux-mm&m=149820805124906&w=2

Link: http://lkml.kernel.org/r/20170713021416.8897-1-joelaf@google.com

Cc: Tim Murray <timmurray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Signed-off-by: NJoel Fernandes <joelaf@google.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

84861885

01 5月, 2017 1 次提交

ring-buffer: Return reader page back into existing ring buffer · 73a757e6

由 Steven Rostedt (VMware) 提交于 5月 01, 2017

When reading the ring buffer for consuming, it is optimized for splice,
where a page is taken out of the ring buffer (zero copy) and sent to the
reading consumer. When the read is finished with the page, it calls
ring_buffer_free_read_page(), which simply frees the page. The next time the
reader needs to get a page from the ring buffer, it must call
ring_buffer_alloc_read_page() which allocates and initializes a reader page
for the ring buffer to be swapped into the ring buffer for a new filled page
for the reader.

The problem is that there's no reason to actually free the page when it is
passed back to the ring buffer. It can hold it off and reuse it for the next
iteration. This completely removes the interaction with the page_alloc
mechanism.

Using the trace-cmd utility to record all events (causing trace-cmd to
require reading lots of pages from the ring buffer, and calling
ring_buffer_alloc/free_read_page() several times), and also assigning a
stack trace trigger to the mm_page_alloc event, we can see how many times
the ring_buffer_alloc_read_page() needed to allocate a page for the ring
buffer.

Before this change:

  # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
  # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
  9968

After this change:

  # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
  # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
  4
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

73a757e6

20 4月, 2017 1 次提交

ring-buffer: Have ring_buffer_iter_empty() return true when empty · 78f7a45d

由 Steven Rostedt (VMware) 提交于 4月 19, 2017

I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:

 ># cat snapshot
 # tracer: nop
 #
 #
 # * Snapshot is allocated *
 #
 # Snapshot commands:
 # echo 0 > snapshot : Clears and frees snapshot buffer
 # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
 #                      Takes a snapshot of the main buffer.
 # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
 #                      (Doesn't have to be '2' works with any number that
 #                       is not a '0' or '1')

But instead it just showed an empty buffer:

 ># cat snapshot
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 0/0   #P:4
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |

What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.

Cc: stable@vger.kernel.org
Fixes: 651e22f2 ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

78f7a45d

05 4月, 2017 1 次提交

ring-buffer: Fix return value check in test_ringbuffer() · 62277de7

由 Wei Yongjun 提交于 6月 17, 2016

In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().

Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com

Cc: stable@vger.kernel.org
Fixes: 6c43e554 ("ring-buffer: Add ring buffer startup selftest")
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

62277de7

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571

由 Ingo Molnar 提交于 2月 01, 2017

We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

e6017571

13 12月, 2016 1 次提交

tracing/rb: Init the CPU mask on allocation · 99e6f6e8

由 Sebastian Andrzej Siewior 提交于 12月 07, 2016

Before commit b32614c0 ("tracing/rb: Convert to hotplug state
machine") the allocated cpumask was initialized to the mask of ONLINE or
POSSIBLE CPUs. After the CPU hotplug changes the buffer initialisation
moved to trace_rb_cpu_prepare() but I forgot to initially set the
cpumask to zero. This is done now.

Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.de

Fixes: b32614c0 ("tracing/rb: Convert to hotplug state machine")
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

99e6f6e8

07 12月, 2016 1 次提交

tracing/rb: Init the CPU mask on allocation · b18cc3de

由 Sebastian Andrzej Siewior 提交于 12月 07, 2016

Before commit b32614c0 ("tracing/rb: Convert to hotplug state machine")
the allocated cpumask was initialized to the mask of online or possible
CPUs. After the CPU hotplug changes the buffer initialization moved to
trace_rb_cpu_prepare() but the cpumask is allocated with alloc_cpumask()
and therefor has random content. As a consequence the cpu buffers are not
initialized and a later access dereferences a NULL pointer.

Use zalloc_cpumask() instead so trace_rb_cpu_prepare() initializes the
buffers properly.

Fixes: b32614c0 ("tracing/rb: Convert to hotplug state machine")
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20161207133133.hzkcqfllxcdi3joz@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b18cc3de

02 12月, 2016 1 次提交

tracing/rb: Convert to hotplug state machine · b32614c0

由 Sebastian Andrzej Siewior 提交于 11月 27, 2016

Install the callbacks via the state machine. The notifier in struct
ring_buffer is replaced by the multi instance interface.  Upon
__ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
the trace_rb_cpu_prepare() on each CPU.

This callback may now fail. This means __ring_buffer_alloc() will fail and
cleanup (like previously) and during a CPU up event this failure will not
allow the CPU to come up.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b32614c0

24 11月, 2016 5 次提交

ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline · 38e11df1

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

Both rb_end_commit() and rb_set_commit_to_write() are in the fast path of
the ring buffer recording. Make sure they are always inlined.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

38e11df1

ring-buffer: Froce rb_update_write_stamp() to be inlined · babe3fce

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

The function rb_update_write_stamp() is in the hotpath of the ring buffer
recording. Make sure that it is inlined as well. There's not many places
that call it.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

babe3fce

ring-buffer: Force inline of hotpath helper functions · 2289d567

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

There's several small helper functions in ring_buffer.c that are used in the
hot path. For some reason, even though they are marked inline, gcc tends not
to enforce it. Make sure these functions are always inlined.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2289d567

ring-buffer: Always inline rb_event_data() · 929ddbf3

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

The rb_event_data() is the fast path of getting the ring buffer data from an
event. Externally, ring_buffer_event_data() is used to access this function.
But unfortunately, rb_event_data() is not inlined, and calling
ring_buffer_event_data() causes that function to be called again. Force
rb_event_data() to be inlined to lower the number of operations needed when
calling ring_buffer_event_data().

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

929ddbf3

ring-buffer: Make rb_reserve_next_event() always inlined · fa7ffb39

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2016

The function rb_reserved_next_event() is called by two functions:
ring_buffer_lock_reserve() and ring_buffer_write(). This is in a very hot
path of the tracing code, and it is best that they are not functions. The
two callers are basically wrapers for rb_reserver_next_event(). Removing the
function calls can save execution time in the hotpath of tracing.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.orgReported-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fa7ffb39

14 5月, 2016 1 次提交

ring-buffer: Prevent overflow of size in ring_buffer_resize() · 59643d15

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2016

If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.

Here's the details:

  # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb

tracing_entries_write() processes this and converts kb to bytes.

 18014398509481980 << 10 = 18446744073709547520

and this is passed to ring_buffer_resize() as unsigned long size.

 size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);

Where DIV_ROUND_UP(a, b) is (a + b - 1)/b

BUF_PAGE_SIZE is 4080 and here

 18446744073709547520 + 4080 - 1 = 18446744073709551599

where 18446744073709551599 is still smaller than 2^64

 2^64 - 18446744073709551599 = 17

But now 18446744073709551599 / 4080 = 4521260802379792

and size = size * 4080 = 18446744073709551360

This is checked to make sure its still greater than 2 * 4080,
which it is.

Then we convert to the number of buffer pages needed.

 nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)

but this time size is 18446744073709551360 and

 2^64 - (18446744073709551360 + 4080 - 1) = -3823

Thus it overflows and the resulting number is less than 4080, which makes

  3823 / 4080 = 0

an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.

There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.

Cc: stable@vger.kernel.org # 3.5+
Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

59643d15

13 5月, 2016 1 次提交

ring-buffer: Use long for nr_pages to avoid overflow failures · 9b94a8fb

由 Steven Rostedt (Red Hat) 提交于 5月 12, 2016

The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.

For example, the following will cause the ring buffer to crash:

 # cd /sys/kernel/debug/tracing
 # echo 10 > buffer_size_kb
 # echo 8556384240 > buffer_size_kb

Then you get the warning of:

 WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260

Which is:

  RB_WARN_ON(cpu_buffer, nr_removed);

Note each ring buffer page holds 4080 bytes.

This is because:

 1) 10 causes the ring buffer to have 3 pages.
    (10kb requires 3 * 4080 pages to hold)

 2) (2^31 / 2^10  + 1) * 4080 = 8556384240
    The value written into buffer_size_kb is shifted by 10 and then passed
    to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760

 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
    which is 4080. 8761737461760 / 4080 = 2147484672

 4) nr_pages is subtracted from the current nr_pages (3) and we get:
    2147484669. This value is saved in a signed integer nr_pages_to_update

 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
    turns into the value of -2147482627

 6) As the value is a negative number, in update_pages_handler() it is
    negated and passed to rb_remove_pages() and 2147482627 pages will
    be removed, which is much larger than 3 and it causes the warning
    because not all the pages asked to be removed were removed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001

Cc: stable@vger.kernel.org # 2.6.28+
Fixes: 7a8e76a3 ("tracing: unified trace buffer")
Reported-by: NHao Qin <QEver.cn@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9b94a8fb

26 11月, 2015 1 次提交

ring-buffer: Process commits whenever moving to a new page. · 4239c38f

由 Steven Rostedt (Red Hat) 提交于 11月 17, 2015

When crossing over to a new page, commit the current work. This will allow
readers to get data with less latency, and also simplifies the work to get
timestamps working for interrupted events.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4239c38f

24 11月, 2015 4 次提交

ring-buffer: Remove redundant update of page timestamp · 70004986

由 Steven Rostedt (Red Hat) 提交于 11月 17, 2015

The first commit of a buffer page updates the timestamp of that page. No
need to have the update to the next page add the timestamp too. It will only
be replaced by the first commit on that page anyway.

Only update to a page if it contains an event.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

70004986

ring-buffer: Use READ_ONCE() for most tail_page access · 8573636e

由 Steven Rostedt (Red Hat) 提交于 11月 17, 2015

As cpu_buffer->tail_page may be modified by interrupts at almost any time,
the flow of logic is very important. Do not let gcc get smart with
re-reading cpu_buffer->tail_page by adding READ_ONCE() around most of its
accesses.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8573636e

ring-buffer: Put back the length if crossed page with add_timestamp · bd1b7cd3

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2015

Commit fcc742ea "ring-buffer: Add event descriptor to simplify passing
data" added a descriptor that holds various data instead of passing around
several variables through parameters. The problem was that one of the
parameters was modified in a function and the code was designed not to have
an effect on that modified  parameter. Now that the parameter is a
descriptor and any modifications to it are non-volatile, the size of the
data could be unnecessarily expanded.

Remove the extra space added if a timestamp was added and the event went
across the page.

Cc: stable@vger.kernel.org # 4.3+
Fixes: fcc742ea "ring-buffer: Add event descriptor to simplify passing data"
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bd1b7cd3

ring-buffer: Update read stamp with first real commit on page · b81f472a

由 Steven Rostedt (Red Hat) 提交于 11月 23, 2015

Do not update the read stamp after swapping out the reader page from the
write buffer. If the reader page is swapped out of the buffer before an
event is written to it, then the read_stamp may get an out of date
timestamp, as the page timestamp is updated on the first commit to that
page.

rb_get_reader_page() only returns a page if it has an event on it, otherwise
it will return NULL. At that point, check if the page being returned has
events and has not been read yet. Then at that point update the read_stamp
to match the time stamp of the reader page.

Cc: stable@vger.kernel.org # 2.6.30+
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b81f472a

03 11月, 2015 4 次提交

ring-buffer: rb_event_is_commit() can return boolean · cdb2a0a9

由 Yaowei Bai 提交于 9月 29, 2015

Make rb_event_is_commit() return bool to improve readability
due to this particular function only using either one or zero as its
return value.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-7-git-send-email-bywxiaobai@163.comSigned-off-by: NYaowei Bai <bywxiaobai@163.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

cdb2a0a9

ring-buffer: rb_per_cpu_empty() can return boolean · da58834c

由 Yaowei Bai 提交于 9月 29, 2015

Makes rb_per_cpu_empty() return bool to improve readability.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-6-git-send-email-bywxiaobai@163.comSigned-off-by: NYaowei Bai <bywxiaobai@163.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

da58834c

ring_buffer: ring_buffer_empty{cpu}() can return boolean · 3d4e204d

由 Yaowei Bai 提交于 9月 29, 2015

Make ring_buffer_empty() and ring_buffer_empty_cpu() return bool.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-5-git-send-email-bywxiaobai@163.comSigned-off-by: NYaowei Bai <bywxiaobai@163.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3d4e204d

ring-buffer: rb_is_reader_page() can return boolean · 06ca3209

由 Yaowei Bai 提交于 9月 29, 2015

Make rb_is_reader_page() return bool to improve readability due to this
particular function only using either true or false as its return value.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-4-git-send-email-bywxiaobai@163.comSigned-off-by: NYaowei Bai <bywxiaobai@163.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

06ca3209

03 9月, 2015 1 次提交

ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated" · b7dc42fd

由 Steven Rostedt (Red Hat) 提交于 9月 03, 2015

The commit a4543a2f "ring-buffer: Get timestamp after event is
allocated" is needed for some future work. But after adding it, there is a
race somewhere that causes the saved timestamp to have a slight shift, and
get ahead of the actual timestamp and make it look like time goes backwards.

I'm still looking into why this happens, but in the mean time, this is
holding up other work to get in. I'm reverting the change for now (which
makes the problem go away), and will add it back after I know what is wrong
and fix it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b7dc42fd

21 7月, 2015 5 次提交

ring-buffer: Reorganize function locations · d90fd774

由 Steven Rostedt (Red Hat) 提交于 5月 29, 2015

Functions in ring-buffer.c have gotten interleaved between different
use cases. Move the functions around to get like functions closer
together. This may or may not help gcc keep cache locality, but it
makes it a little easier to work with the code.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d90fd774

ring-buffer: Make sure event has enough room for extend and padding · 7d75e683

由 Steven Rostedt (Red Hat) 提交于 5月 29, 2015

Now that events only add time extends after it is committed, in case
an event comes in before it can discard the allocated event, the time
extend needs to be stored within the event. If the event is bigger
than then size needed for the time extend, padding must be added.
The minimum padding size is 8 bytes. Thus if the event is 12 bytes
(size of time extend + 4), there will not be enough room to add both
the time extend and padding. Make sure all events are either 8 bytes
or 16 or more bytes.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7d75e683

ring-buffer: Get timestamp after event is allocated · a4543a2f

由 Steven Rostedt (Red Hat) 提交于 5月 29, 2015

Move the capturing of the timestamp to after an event is allocated.
If the event is not a commit (where it is an event that preempted
another event), then no timestamp is needed, because the delta of
nested events is always zero.

If the event starts on a new page, no delta needs to be calculated
as the full timestamp will be added to the page header, and the
event will have a delta of zero.

Now if the event requires a time extend (the delta does not fit
in the 27 bit delta slot in the header), then the event is discarded,
the length is extended to hold the TIME_EXTEND event that allows for
a 59 bit delta, and the commit is tried again.

If the event can't be discarded (another event came in after it),
then the TIME_EXTEND is added directly to the allocated event and
the rest of the event is given padding.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a4543a2f

ring-buffer: Move the adding of the extended timestamp out of line · 9826b273

由 Steven Rostedt (Red Hat) 提交于 5月 28, 2015

Requiring a extended time stamp is an uncommon occurrence, and it is
best to do it out of line when needed.

Add a noinline function that handles the extended timestamp and
have it called with an unlikely to completely move it out of the
fast path.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9826b273

ring-buffer: Add event descriptor to simplify passing data · fcc742ea

由 Steven Rostedt (Red Hat) 提交于 5月 28, 2015

Add rb_event_info descriptor to pass event info to functions a bit
easier than using a bunch of parameters. This will also allow for
changing the code around a bit to find better fast paths.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fcc742ea

29 5月, 2015 3 次提交

ring-buffer: Add enum names for the context levels · a497adb4

由 Steven Rostedt (Red Hat) 提交于 5月 29, 2015

Instead of having hard coded numbers for the context levels, use
enums to describe them more.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a497adb4

ring-buffer: Remove useless unused tracing_off_permanent() · 3c6296f7

由 Steven Rostedt (Red Hat) 提交于 5月 28, 2015

The tracing_off_permanent() call is a way to disable all ring_buffers.
Nothing uses it and nothing should use it, as tracing_off() and
friends are better, as they disable the ring buffers related to
tracing. The tracing_off_permanent() even disabled non tracing
ring buffers. This is a bit drastic, and was added to handle NMIs
doing outputs that could corrupt the ring buffer when only tracing
used them. It is now obsolete and adds a little overhead, it should
be removed.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3c6296f7

ring-buffer: Give NMIs a chance to lock the reader_lock · 289a5a25

由 Steven Rostedt (Red Hat) 提交于 5月 28, 2015

Currently, if an NMI does a dump of a ring buffer, it disables
all ring buffers from ever doing any writes again. This is because
it wont take the locks for the cpu_buffer and this can cause
corruption if it preempted a read, or a read happens on another
CPU for the current cpu buffer. This is a bit overkill.

First, it should at least try to take the lock, and if it fails
then disable it. Also, there's no need to disable all ring
buffers, even those that are unrelated to what is being read.
Only disable the per cpu ring buffer that is being read if
it can not get the lock for it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

289a5a25

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多