提交 · a497adb45b8691f7e477e711a1a4bd54748d64fe · OpenHarmony / kernel_linux

29 5月, 2015 3 次提交

ring-buffer: Add enum names for the context levels · a497adb4

由 Steven Rostedt (Red Hat) 提交于 5月 29, 2015

Instead of having hard coded numbers for the context levels, use
enums to describe them more.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a497adb4

ring-buffer: Remove useless unused tracing_off_permanent() · 3c6296f7

由 Steven Rostedt (Red Hat) 提交于 5月 28, 2015

The tracing_off_permanent() call is a way to disable all ring_buffers.
Nothing uses it and nothing should use it, as tracing_off() and
friends are better, as they disable the ring buffers related to
tracing. The tracing_off_permanent() even disabled non tracing
ring buffers. This is a bit drastic, and was added to handle NMIs
doing outputs that could corrupt the ring buffer when only tracing
used them. It is now obsolete and adds a little overhead, it should
be removed.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3c6296f7

ring-buffer: Give NMIs a chance to lock the reader_lock · 289a5a25

由 Steven Rostedt (Red Hat) 提交于 5月 28, 2015

Currently, if an NMI does a dump of a ring buffer, it disables
all ring buffers from ever doing any writes again. This is because
it wont take the locks for the cpu_buffer and this can cause
corruption if it preempted a read, or a read happens on another
CPU for the current cpu buffer. This is a bit overkill.

First, it should at least try to take the lock, and if it fails
then disable it. Also, there's no need to disable all ring
buffers, even those that are unrelated to what is being read.
Only disable the per cpu ring buffer that is being read if
it can not get the lock for it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

289a5a25

27 5月, 2015 3 次提交

ring-buffer: Add trace_recursive checks to ring_buffer_write() · 985e871b

由 Steven Rostedt (Red Hat) 提交于 5月 27, 2015

The ring_buffer_write() function isn't protected by the trace recursive
writes. Luckily, this function is not used as much and is unlikely
to ever recurse. But it should still have the protection, because
even a call to ring_buffer_lock_reserve() could cause ring buffer
corruption if called when ring_buffer_write() is being used.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

985e871b

ring-buffer: Allways do the trace_recursive checks · 6776221b

由 Steven Rostedt (Red Hat) 提交于 5月 27, 2015

Currently the trace_recursive checks are only done if CONFIG_TRACING
is enabled. That was because there use to be a dependency with tracing
for the recursive checks (it used the task_struct trace recursive
variable). But now it uses its own variable and there is no dependency.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6776221b

ring-buffer: Move recursive check to per_cpu descriptor · 58a09ec6

由 Steven Rostedt (Red Hat) 提交于 5月 27, 2015

Instead of using a global per_cpu variable to perform the recursive
checks into the ring buffer, use the already existing per_cpu descriptor
that is part of the ring buffer itself.

Not only does this simplify the code, it also allows for one ring buffer
to be used within the guts of the use of another ring buffer. For example
trace_printk() can now be used within the ring buffer to record changes
done by an instance into the main ring buffer. The recursion checks
will prevent the trace_printk() itself from causing recursive issues
with the main ring buffer (it is just ignored), but the recursive
checks wont prevent the trace_printk() from recording other ring buffers.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

58a09ec6

22 5月, 2015 1 次提交

ring-buffer: Add unlikelys to make fast path the default · 3205f806

由 Steven Rostedt (Red Hat) 提交于 5月 21, 2015

I was running the trace_event benchmark and noticed that the times
to record a trace_event was all over the place. I looked at the assembly
of the ring_buffer_lock_reserver() and saw this:

 <ring_buffer_lock_reserve>:
       31 c0                   xor    %eax,%eax
       48 83 3d 76 47 bd 00    cmpq   $0x1,0xbd4776(%rip)        # ffffffff81d10d60 <ring_buffer_flags>
       01
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       75 1d                   jne    ffffffff8113c60d <ring_buffer_lock_reserve+0x2d>
       65 ff 05 69 e3 ec 7e    incl   %gs:0x7eece369(%rip)        # a960 <__preempt_count>
       8b 47 08                mov    0x8(%rdi),%eax
       85 c0                   test   %eax,%eax
 +---- 74 12                   je     ffffffff8113c610 <ring_buffer_lock_reserve+0x30>
 |     65 ff 0d 5b e3 ec 7e    decl   %gs:0x7eece35b(%rip)        # a960 <__preempt_count>
 |     0f 84 85 00 00 00       je     ffffffff8113c690 <ring_buffer_lock_reserve+0xb0>
 |     31 c0                   xor    %eax,%eax
 |     5d                      pop    %rbp
 |     c3                      retq
 |     90                      nop
 +---> 65 44 8b 05 48 e3 ec    mov    %gs:0x7eece348(%rip),%r8d        # a960 <__preempt_count>
       7e
       41 81 e0 ff ff ff 7f    and    $0x7fffffff,%r8d
       b0 08                   mov    $0x8,%al
       65 8b 0d 58 36 ed 7e    mov    %gs:0x7eed3658(%rip),%ecx        # fc80 <current_context>
       41 f7 c0 00 ff 1f 00    test   $0x1fff00,%r8d
       74 1e                   je     ffffffff8113c64f <ring_buffer_lock_reserve+0x6f>
       41 f7 c0 00 00 10 00    test   $0x100000,%r8d
       b0 01                   mov    $0x1,%al
       75 13                   jne    ffffffff8113c64f <ring_buffer_lock_reserve+0x6f>
       41 81 e0 00 00 0f 00    and    $0xf0000,%r8d
       49 83 f8 01             cmp    $0x1,%r8
       19 c0                   sbb    %eax,%eax
       83 e0 02                and    $0x2,%eax
       83 c0 02                add    $0x2,%eax
       85 c8                   test   %ecx,%eax
       75 ab                   jne    ffffffff8113c5fe <ring_buffer_lock_reserve+0x1e>
       09 c8                   or     %ecx,%eax
       65 89 05 24 36 ed 7e    mov    %eax,%gs:0x7eed3624(%rip)        # fc80 <current_context>

The arrow is the fast path.

After adding the unlikely's, the fast path looks a bit better:

 <ring_buffer_lock_reserve>:
       31 c0                   xor    %eax,%eax
       48 83 3d 76 47 bd 00    cmpq   $0x1,0xbd4776(%rip)        # ffffffff81d10d60 <ring_buffer_flags>
       01
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       75 7b                   jne    ffffffff8113c66b <ring_buffer_lock_reserve+0x8b>
       65 ff 05 69 e3 ec 7e    incl   %gs:0x7eece369(%rip)        # a960 <__preempt_count>
       8b 47 08                mov    0x8(%rdi),%eax
       85 c0                   test   %eax,%eax
       0f 85 9f 00 00 00       jne    ffffffff8113c6a1 <ring_buffer_lock_reserve+0xc1>
       65 8b 0d 57 e3 ec 7e    mov    %gs:0x7eece357(%rip),%ecx        # a960 <__preempt_count>
       81 e1 ff ff ff 7f       and    $0x7fffffff,%ecx
       b0 08                   mov    $0x8,%al
       65 8b 15 68 36 ed 7e    mov    %gs:0x7eed3668(%rip),%edx        # fc80 <current_context>
       f7 c1 00 ff 1f 00       test   $0x1fff00,%ecx
       75 50                   jne    ffffffff8113c670 <ring_buffer_lock_reserve+0x90>
       85 d0                   test   %edx,%eax
       75 7d                   jne    ffffffff8113c6a1 <ring_buffer_lock_reserve+0xc1>
       09 d0                   or     %edx,%eax
       65 89 05 53 36 ed 7e    mov    %eax,%gs:0x7eed3653(%rip)        # fc80 <current_context>
       65 8b 05 fc da ec 7e    mov    %gs:0x7eecdafc(%rip),%eax        # a130 <cpu_number>
       89 c2                   mov    %eax,%edx
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3205f806

14 5月, 2015 13 次提交

tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call · a7237765

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_raw_##call structures are built
by macros for trace events. They have nothing to do with function tracing.
Rename them.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a7237765

tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled() · 09a5059a

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_trigger_soft_disabled() tests if a
trace_event is soft disabled (called but not traced), and returns true if
it is. It has nothing to do with function tracing and should be renamed.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

09a5059a

tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_* · 5d6ad960

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The FTRACE_EVENT_FL_* flags are flags to
do with the trace_event files in the tracefs directory. They are not related
to function tracing. Rename them to a more descriptive name.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5d6ad960

tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir · 7967b3e0

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_subsystem_dir holds
the information about trace event subsystems. It should not be named
ftrace, rename it to trace_subsystem_dir.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7967b3e0

tracing: Rename ftrace_event_name() to trace_event_name() · 687fcc4a

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. ftrace_event_name() returns the name of
an event tracepoint, has nothing to do with function tracing. Rename it
to trace_event_name().
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

687fcc4a

tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX · 609a7404

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. Rename the max trace_event type size to
something more descriptive and appropriate.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

609a7404

tracing: Rename ftrace_output functions to trace_output · 892c505a

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_output_*() and ftrace_raw_output_*()
functions represent the trace_event code. Rename them to just trace_output
or trace_raw_output.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

892c505a

tracing: Rename ftrace_event_buffer to trace_event_buffer. · 3f795dcf

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_event_buffer functions and data
structures are for trace_events and not for function hooks. Rename them
to trace_event_buffer*.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3f795dcf

tracing: Rename ftrace_event_{call,class} to trace_event_{call,class} · 2425bcb9

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2425bcb9

tracing: Rename ftrace_event_file to trace_event_file · 7f1d2f82

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_file is really
about trace events and not "ftrace". Rename it to trace_event_file.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7f1d2f82

tracing: Rename (un)register_ftrace_event() to (un)register_trace_event() · 9023c930

由 Steven Rostedt (Red Hat) 提交于 5月 05, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The functions (un)register_ftrace_event() is
really about trace_events, and the name should be register_trace_event()
instead.

Also renamed ftrace_event_reg() to trace_event_reg() for the same reason.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9023c930

tracing: Rename ftrace_print_*() functions ta trace_print_*() · 645df987

由 Steven Rostedt (Red Hat) 提交于 5月 04, 2015

The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The functions ftrace_print_*() are not part of
the function infrastructure, and the names can be confusing. Rename them
to be trace_print_*().
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

645df987

tracing: Rename ftrace_event.h to trace_events.h · af658dca

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2015

The term "ftrace" is really the infrastructure of the function hooks,
and not the trace events. Rename ftrace_event.h to trace_events.h to
represent the trace_event infrastructure and decouple the term ftrace
from it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

af658dca

13 5月, 2015 2 次提交

ftrace: Provide trace clock monotonic raw · aabfa5f2

由 Drew Richardson 提交于 5月 08, 2015

Expose the NMI safe accessor to the monotonic raw clock to the
tracer. The mono clock was added with commit
1b3e5c09. The advantage of the
monotonic raw clock is that it will advance more constantly than the
monotonic clock.

Imagine someone is trying to optimize a particular program to reduce
instructions executed for a given workload while minimizing the effect
on runtime. Also suppose that NTP is running and potentially making
larger adjustments to the monotonic clock. If NTP is adjusting the
monotonic clock to advance more rapidly, the program will appear to
use fewer instructions per second but run longer than if the monotonic
raw clock had been used. The total number of instructions observed
would be the same regardless of the clock source used, but how it's
attributed to time would be affected.

Conversely if NTP is adjusting the monotonic clock to advance more
slowly, the program will appear to use more instructions per second
but run more quickly. Of course there are many sources that can cause
jitter in performance measurements on modern processors, but let's
remove NTP from the list.

The monotonic raw clock can also be useful for tracing early boot,
e.g. when debugging issues with NTP.

Link: http://lkml.kernel.org/r/20150508143037.GB1276@dreric01-Precision-T1650Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NDrew Richardson <drew.richardson@arm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

aabfa5f2

tracing: Export tracing clock functions · 7e255d34

由 Jerry Snitselaar 提交于 4月 30, 2015

Critical tracepoint hooks should never call anything that takes a lock,
so they are unable to call getrawmonotonic() or ktime_get().

Export the rest of the tracing clock functions so can be used in
tracepoint hooks.

Background: We have a customer that adds their own module and registers
a tracepoint hook to sched_wakeup. They were using ktime_get() for a
time source, but it grabs a seq lock and caused a deadlock to occur.

Link: http://lkml.kernel.org/r/1430406624-22609-1-git-send-email-jsnitsel@redhat.comSigned-off-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7e255d34

07 5月, 2015 1 次提交

tracing: Make ftrace_print_array_seq compute buf_len · ac01ce14

由 Alex Bennée 提交于 4月 29, 2015

The only caller to this function (__print_array) was getting it wrong by
passing the array length instead of buffer length. As the element size
was already being passed for other reasons it seems reasonable to push
the calculation of buffer length into the function.

Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.orgSigned-off-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ac01ce14

01 5月, 2015 1 次提交

modsign: change default key details · 9c4249c8

由 David Howells 提交于 4月 30, 2015

Change default key details to be more obviously unspecified.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJames Morris <james.l.morris@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c4249c8

29 4月, 2015 1 次提交

cpuidle: Run tick_broadcast_exit() with disabled interrupts · df8d9eea

由 Rafael J. Wysocki 提交于 4月 29, 2015

Commit 335f4919 (sched/idle: Use explicit broadcast oneshot
control function) replaced clockevents_notify() invocations in
cpuidle_idle_call() with direct calls to tick_broadcast_enter()
and tick_broadcast_exit(), but it overlooked the fact that
interrupts were already enabled before calling the latter which
led to functional breakage on systems using idle states with the
CPUIDLE_FLAG_TIMER_STOP flag set.

Fix that by moving the invocations of tick_broadcast_enter()
and tick_broadcast_exit() down into cpuidle_enter_state() where
interrupts are still disabled when tick_broadcast_exit() is
called.  Also ensure that interrupts will be disabled before
running tick_broadcast_exit() even if they have been enabled by
the idle state's ->enter callback.  Trigger a WARN_ON_ONCE() in
that case, as we generally don't want that to happen for states
with CPUIDLE_FLAG_TIMER_STOP set.

Fixes: 335f4919 (sched/idle: Use explicit broadcast oneshot control function)
Reported-and-tested-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

df8d9eea

28 4月, 2015 1 次提交

bpf: fix 64-bit divide · 876a7ae6

由 Alexei Starovoitov 提交于 4月 27, 2015

ALU64_DIV instruction should be dividing 64-bit by 64-bit,
whereas do_div() does 64-bit by 32-bit divide.
x64 and arm64 JITs correctly implement 64 by 64 unsigned divide.
llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64.

Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets")
Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

876a7ae6

27 4月, 2015 1 次提交

x86: pvclock: Really remove the sched notifier for cross-cpu migrations · 73459e2a

由 Paolo Bonzini 提交于 4月 23, 2015

This reverts commits 0a4e6be9
and 80f7fdb1.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73459e2a

25 4月, 2015 2 次提交

clockevents: Shutdown detached clockevent device · 149aabcc

由 Viresh Kumar 提交于 4月 10, 2015

A clockevent device is marked DETACHED when it is replaced by another
clockevent device.

The device is shutdown properly for drivers that implement legacy
->set_mode() callback, as we call ->set_mode() for CLOCK_EVT_MODE_UNUSED
as well.

But for the new per-state callback interface, we skip shutting down the
device, as we thought its an internal state change. That wasn't correct.

The effect is that the device is left programmed in oneshot or periodic
mode.

Fall-back to 'case CLOCK_EVT_STATE_SHUTDOWN', to shutdown the device.

Fixes: bd624d75 "clockevents: Introduce mode specific callbacks"
Reported-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/eef0a91c51b74d4e52c8e5a95eca27b5a0563f07.1428650683.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

149aabcc

genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip · 10a50f1a

由 Roger Quadros 提交于 4月 15, 2015

Without this system suspend is broken on systems that have
drivers calling enable/disable_irq_wake() for interrupts based off
the dummy irq hook. (e.g. drivers/gpio/gpio-pcf857x.c)
Signed-off-by: NRoger Quadros <rogerq@ti.com>
Cc: <cw00.choi@samsung.com>
Cc: <balbi@ti.com>
Cc: <tony@atomide.com>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Link: http://lkml.kernel.org/r/552E1DD3.4040106@ti.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

10a50f1a

23 4月, 2015 1 次提交

kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP · 7e01b5ac

由 Martin Schwidefsky 提交于 4月 16, 2015

Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code
to override the gfp flags of the allocation for the kexec control
page. The loop in kimage_alloc_normal_control_pages allocates pages
with GFP_KERNEL until a page is found that happens to have an
address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems
with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT
the loop will keep allocating memory until the oom killer steps in.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7e01b5ac

20 4月, 2015 1 次提交

smp: Fix error case handling in smp_call_function_*() · 5224b961

由 Linus Torvalds 提交于 4月 19, 2015

Commit 8053871d ("smp: Fix smp_call_function_single_async()
locking") fixed the locking for the asynchronous smp-call case, but in
the process of moving the lock handling around, one of the error cases
ended up not unlocking the call data at all.

This went unnoticed on x86, because this is a "caller is buggy" case,
where the caller is trying to call a non-existent CPU.  But apparently
ARM does that (at least under qemu-arm).  Bindly doing cross-cpu calls
to random CPU's that aren't even online seems a bit fishy, but the error
handling was clearly not correct.

Simply add the missing "csd_unlock()" to the error path.
Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
Analyzed-by: NRabin Vincent <rabin@rab.in>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5224b961

17 4月, 2015 9 次提交

tracing: Fix possible out of bounds memory access when parsing enums · 3193899d

由 Steven Rostedt (Red Hat) 提交于 4月 17, 2015

The code that replaces the enum names with the enum values in the
tracepoints' format files could possible miss the end of string nul
character. This was caused by processing things like backslashes, quotes
and other tokens. After processing the tokens, a check for the nul
character needed to be done before continuing the loop, because the loop
incremented the pointer before doing the check, which could bypass the nul
character.

Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com

Reported-by: Sasha Levin <sasha.levin@oracle.com> # via KASan
Tested-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
Fixes: 0c564a53 "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values"
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3193899d

oprofile: reduce mmap_sem hold for mm->exe_file · 11163348

由 Davidlohr Bueso 提交于 4月 16, 2015

sync_buffer() needs the mmap_sem for two distinct operations, both only
occurring upon user context switch handling:

 1) Dealing with the exe_file.

 2) Adding the dcookie data as we need to lookup the vma that
   backs it. This is done via add_sample() and add_data().

This patch isolates 1), for it will no longer need the mmap_sem for
serialization.  However, for now, make of the more standard
get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
and relying on reference counting to make sure that the exe file won't
dissappear underneath us while doing the get dcookie.

As a consequence, for 2) we move the mmap_sem locking into where we really
need it, in lookup_dcookie().  The benefits are twofold: reduce mmap_sem
hold times, and cleaner code.

[akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko]
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11163348

gcov: fix softlockups · 9d796e66

由 Andrey Ryabinin 提交于 4月 16, 2015

gcov profiling if enabled with other heavy compile-time instrumentation
like KASan could trigger following softlockups:

  NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
  Modules linked in:
  irq event stamp: 22823276
  hardirqs last  enabled at (22823275): [<ffffffff86e8d10d>] mutex_lock_nested+0x7d9/0x930
  hardirqs last disabled at (22823276): [<ffffffff86e9521d>] apic_timer_interrupt+0x6d/0x80
  softirqs last  enabled at (22823172): [<ffffffff811ed969>] __do_softirq+0x4db/0x729
  softirqs last disabled at (22823167): [<ffffffff811edfcf>] irq_exit+0x7d/0x15b
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       3.19.0-05245-gbb33326-dirty #3
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
  task: ffff88006cba8000 ti: ffff88006cbb0000 task.ti: ffff88006cbb0000
  RIP: kasan_mem_to_shadow+0x1e/0x1f
  Call Trace:
    strcmp+0x28/0x70
    get_node_by_name+0x66/0x99
    gcov_event+0x4f/0x69e
    gcov_enable_events+0x54/0x7b
    gcov_fs_init+0xf8/0x134
    do_one_initcall+0x1b2/0x288
    kernel_init_freeable+0x467/0x580
    kernel_init+0x15/0x18b
    ret_from_fork+0x7c/0xb0
  Kernel panic - not syncing: softlockup: hung tasks

Fix this by sticking cond_resched() in gcov_enable_events().
Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9d796e66

kernel/sysctl.c: detect overflows when converting to int · 230633d1

由 Heinrich Schuchardt 提交于 4月 16, 2015

When converting unsigned long to int overflows may occur.  These currently
are not detected when writing to the sysctl file system.

E.g. on a system where int has 32 bits and long has 64 bits
  echo 0x800001234 > /proc/sys/kernel/threads-max
has the same effect as
  echo 0x1234 > /proc/sys/kernel/threads-max

The patch adds the missing check in do_proc_dointvec_conv.

With the patch an overflow will result in an error EINVAL when writing to
the the sysctl file system.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

230633d1

prctl: avoid using mmap_sem for exe_file serialization · 6e399cd1

由 Davidlohr Bueso 提交于 4月 16, 2015

Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case. For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe. As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock. Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e399cd1

mm: rcu-protected get_mm_exe_file() · 90f31d0e

由 Konstantin Khlebnikov 提交于 4月 16, 2015

This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.

[akpm@linux-foundation.org: fix comment typo]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90f31d0e

kernel/sysctl.c: threads-max observe limits · 16db3d3f

由 Heinrich Schuchardt 提交于 4月 16, 2015

Users can change the maximum number of threads by writing to
/proc/sys/kernel/threads-max.

With the patch the value entered is checked against the same limits that
apply when fork_init is called.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

16db3d3f

kernel/fork.c: avoid division by zero · ac1b398d

由 Heinrich Schuchardt 提交于 4月 16, 2015

PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g.  architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With 32-bit calculation there is no solution which delivers valid results
for all possible combinations of the parameters.  The code is only called
once.  Hence a 64-bit calculation can be used as solution.

[akpm@linux-foundation.org: use clamp_t(), per Oleg]
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac1b398d

kernel/fork.c: new function for max_threads · ff691f6e

由 Heinrich Schuchardt 提交于 4月 16, 2015

PAGE_SIZE is not guaranteed to be equal to or less than 8 times the
THREAD_SIZE.

E.g.  architecture hexagon may have page size 1M and thread size 4096.
This would lead to a division by zero in the calculation of max_threads.

With this patch the buggy code is moved to a separate function
set_max_threads.  The error is not fixed.

After fixing the problem in a separate patch the new function can be
reused to adjust max_threads after adding or removing memory.

Argument mempages of function fork_init() is removed as totalram_pages is
an exported symbol.

The creation of separate patches for refactoring to a new function and for
fixing the logic was suggested by Ingo Molnar.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ff691f6e

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多