提交 · ece8e0b2f9c980e5511fe8db2d68c6f1859b9d83 · OpenHarmony / kernel_linux

20 2月, 2013 1 次提交

workqueue: un-GPL function delayed_work_timer_fn() · 1438ade5

由 Konstantin Khlebnikov 提交于 1月 24, 2013

commit d8e794df ("workqueue: set
delayed_work->timer function on initialization") exports function
delayed_work_timer_fn() only for GPL modules. This makes delayed-works
unusable for non-GPL modules, because initialization macro now requires
GPL symbol. For example schedule_delayed_work() available for non-GPL.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # 3.7

1438ade5

19 2月, 2013 2 次提交

cputime: Remove irqsave from seqlock readers · cdc4e86b

由 Thomas Gleixner 提交于 2月 15, 2013

The reader side code has no requirement to disable interrupts while
sampling data. The sequence counter is enough to ensure consistency.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

cdc4e86b

genirq: Export enable/disable_percpu_irq() · 36a5df85

由 Chris Metcalf 提交于 2月 01, 2013

These functions are used by the tilegx onchip network driver, and it's
useful to be able to load that driver as a module.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Link: http://lkml.kernel.org/r/201302012043.r11KhNZF024371@farm-0021.internal.tilera.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

36a5df85

15 2月, 2013 2 次提交

posix-cpu-timers: Fix nanosleep task_struct leak · e6c42c29

由 Stanislaw Gruszka 提交于 2月 15, 2013

The trinity fuzzer triggered a task_struct reference leak via
clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls
posic_cpu_timer_create(), but misses a corresponding
posix_cpu_timer_del() which leads to the task_struct reference leak.
Reported-and-tested-by: NTommi Rantala <tt.rantala@gmail.com>
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e6c42c29

perf/hwbp: Fix cleanup in case of kzalloc failure · 02e176af

由 Daniel Baluta 提交于 2月 06, 2013

Obviously this is a typo and could result in memory leaks if kzalloc
fails on a given cpu.
Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360186160-7566-1-git-send-email-dbaluta@ixiacom.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

02e176af

14 2月, 2013 6 次提交

stop_machine: Use smpboot threads · 14e568e7

由 Thomas Gleixner 提交于 1月 31, 2013

Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

14e568e7

stop_machine: Store task reference in a separate per cpu variable · 860a0ffa

由 Thomas Gleixner 提交于 1月 31, 2013

To allow the stopper thread being managed by the smpboot thread
infrastructure separate out the task storage from the stopper data
structure.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.626690384@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

860a0ffa

smpboot: Allow selfparking per cpu threads · 7d7e499f

由 Thomas Gleixner 提交于 1月 31, 2013

The stop machine threads are still killed when a cpu goes offline. The
reason is that the thread is used to bring the cpu down, so it can't
be parked along with the other per cpu threads.

Allow a per cpu thread to be excluded from automatic parking, so it
can park itself once it's done

Add a create callback function as well.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.553993267@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7d7e499f

workqueue: rename cpu_workqueue to pool_workqueue · 112202d9

由 Tejun Heo 提交于 2月 13, 2013

workqueue has moved away from global_cwqs to worker_pools and with the
scheduled custom worker pools, wforkqueues will be associated with
pools which don't have anything to do with CPUs.  The workqueue code
went through significant amount of changes recently and mass renaming
isn't likely to hurt much additionally.  Let's replace 'cpu' with
'pool' so that it reflects the current design.

* s/struct cpu_workqueue_struct/struct pool_workqueue/
* s/cpu_wq/pool_wq/
* s/cwq/pwq/

This patch is purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>

112202d9

workqueue: reimplement is_chained_work() using current_wq_worker() · 8d03ecfe

由 Tejun Heo 提交于 2月 13, 2013

is_chained_work() was added before current_wq_worker() and implemented
its own ham-fisted way of finding out whether %current is a workqueue
worker - it iterates through all possible workers.

Drop the custom implementation and reimplement using
current_wq_worker().
Signed-off-by: NTejun Heo <tj@kernel.org>

8d03ecfe

workqueue: fix is_chained_work() regression · 1dd63814

由 Tejun Heo 提交于 2月 13, 2013

c9e7cf27 ("workqueue: move busy_hash from global_cwq to
worker_pool") incorrectly converted is_chained_work() to use
get_gcwq() inside for_each_gcwq_cpu() while removing get_gcwq().

As cwq might not exist for all possible workqueue CPUs, @cwq can be
NULL and the following cwq deferences can lead to oops.

Fix it by using for_each_cwq_cpu() instead, which is the better one to
use anyway as we only need to check pools that the wq is associated
with.
Signed-off-by: NTejun Heo <tj@kernel.org>

1dd63814

13 2月, 2013 2 次提交

kernel/pid.c: reenable interrupts when alloc_pid() fails because init has exited · 6e666884

由 Eric W. Biederman 提交于 2月 12, 2013

We're forgetting to reenable local interrupts on an error path.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: NJosh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e666884

clockevents: Fix generic broadcast for FEAT_C3STOP · 5d1d9a29

由 Mark Rutland 提交于 2月 08, 2013

Commit 12ad1000: "clockevents: Add generic timer broadcast function"
made tick_device_uses_broadcast set up the generic broadcast function
for dummy devices (where !tick_device_is_functional(dev)), but neglected
to set up the broadcast function for devices that stop in low power
states (with the CLOCK_EVT_FEAT_C3STOP flag).

When these devices enter low power states they will not have the generic
broadcast function assigned, and will bring down the system when an
attempt is made to broadcast to them.

This patch ensures that the broadcast function is also assigned for
devices which require broadcast in low power states.
Reported-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NStephen Warren <swarren@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Marc.Zyngier@arm.com
Cc: Will.Deacon@arm.com
Cc: santosh.shilimkar@ti.com
Cc: john.stultz@linaro.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5d1d9a29

10 2月, 2013 1 次提交

kprobes: fix wait_for_kprobe_optimizer() · ad72b3be

由 Tejun Heo 提交于 12月 21, 2012

wait_for_kprobe_optimizer() seems largely broken.  It uses
optimizer_comp which is never re-initialized, so
wait_for_kprobe_optimizer() will never wait for anything once
kprobe_optimizer() finishes all pending jobs for the first time.

Also, aside from completion, delayed_work_pending() is %false once
kprobe_optimizer() starts execution and wait_for_kprobe_optimizer()
won't wait for it.

Reimplement it so that it flushes optimizing_work until
[un]optimizing_lists are empty.  Note that this also makes
optimizing_work execute immediately if someone's waiting for it, which
is the nicer behavior.

Only compile tested.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>

ad72b3be

09 2月, 2013 26 次提交

time, Fix setting of hardware clock in NTP code · 84e345e4

由 Prarit Bhargava 提交于 2月 08, 2013

At init time, if the system time is "warped" forward in warp_clock()
it will differ from the hardware clock by sys_tz.tz_minuteswest.  This time
difference is not taken into account when ntp updates the hardware clock,
and this causes the system time to jump forward by this offset every reboot.

The kernel must take this offset into account when writing the system time
to the hardware clock in the ntp code.  This patch adds
persistent_clock_is_local which indicates that an offset has been applied
in warp_clock() and accounts for the "warp" before writing the hardware
clock.

x86 does not have this problem as rtc writes are software limited to a
+/-15 minute window relative to the current rtc time.  Other arches, such
as powerpc, however do a full synchronization of the system time to the
rtc and will see this problem.

[v2]: generated against tip/timers/core
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

84e345e4

uprobes/perf: Avoid uprobe_apply() whenever possible · b2fe8ba6

由 Oleg Nesterov 提交于 2月 04, 2013

uprobe_perf_open/close call the costly uprobe_apply() every time,
we can avoid it if:

	- "nr_systemwide != 0" is not changed.

	- There is another process/thread with the same ->mm.

	- copy_proccess() does inherit_event(). dup_mmap() preserves the
	  inserted breakpoints.

	- event->attr.enable_on_exec == T, we can rely on uprobe_mmap()
	  called by exec/mmap paths.

	- tp_target is exiting. Only _close() checks PF_EXITING, I don't
	  think TRACE_REG_PERF_OPEN can hit the dying task too often.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

b2fe8ba6

uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE · f42d24a1

由 Oleg Nesterov 提交于 2月 04, 2013

Change uprobe_trace_func() and uprobe_perf_func() to return "int". Change
uprobe_dispatcher() to return "trace_ret | perf_ret" although this is not
needed, currently TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive.

The only functional change is that uprobe_perf_func() checks the filtering
too and returns UPROBE_HANDLER_REMOVE if nobody wants to trace current.

Testing:

	# perf probe -x /lib/libc.so.6 syscall

	# perf record -e probe_libc:syscall -i perl -e 'fork; syscall -1 for 1..10; wait'

	# perf report --show-total-period
		100.00%            10     perl  libc-2.8.so    [.] syscall

Before this patch:

	# cat /sys/kernel/debug/tracing/uprobe_profile
		/lib/libc.so.6 syscall				20

A child process doesn't have a counter, but still it hits this breakoint
"copied" by dup_mmap().

After the patch:

	# cat /sys/kernel/debug/tracing/uprobe_profile
		/lib/libc.so.6 syscall				11

The child process hits this int3 only once and does unapply_uprobe().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

f42d24a1

uprobes/perf: Teach trace_uprobe/perf code to pre-filter · 31ba3348

由 Oleg Nesterov 提交于 2月 04, 2013

Finally implement uprobe_perf_filter() which checks ->nr_systemwide or
->perf_events to figure out whether we need to insert the breakpoint.

uprobe_perf_open/close are changed to do uprobe_apply(true/false) when
the new perf event comes or goes away.

Note that currently this is very suboptimal:

	- uprobe_register() called by TRACE_REG_PERF_REGISTER becomes a
	  heavy nop, consumer->filter() always returns F at this stage.

	  As it was already discussed we need uprobe_register_only() to
	  avoid the costly register_for_each_vma() when possible.

	- uprobe_apply() is oftenly overkill. Unless "nr_systemwide != 0"
	  changes we need uprobe_apply_mm(), unapply_uprobe() is almost
	  what we need.

	- uprobe_apply() can be simply avoided sometimes, see the next
	  changes.

Testing:

	# perf probe -x /lib/libc.so.6 syscall

	# perl -e 'syscall -1 while 1' &
	[1] 530

	# perf record -e probe_libc:syscall perl -e 'syscall -1 for 1..10; sleep 1'

	# perf report --show-total-period
		100.00%            10     perl  libc-2.8.so    [.] syscall

Before this patch:

	# cat /sys/kernel/debug/tracing/uprobe_profile
		/lib/libc.so.6 syscall				79291

A huge ->nrhit == 79291 reflects the fact that the background process
530 constantly hits this breakpoint too, even if doesn't contribute to
the output.

After the patch:

	# cat /sys/kernel/debug/tracing/uprobe_profile
		/lib/libc.so.6 syscall				10

This shows that only the target process was punished by int3.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

31ba3348

uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's · 736288ba

由 Oleg Nesterov 提交于 2月 03, 2013

Introduce "struct trace_uprobe_filter" which records the "active"
perf_event's attached to ftrace_event_call. For the start we simply
use list_head, we can optimize this later if needed. For example, we
do not really need to record an event with ->parent != NULL, we can
rely on parent->child_list. And we can certainly do some optimizations
for the case when 2 events have the same ->tp_target or tp_target->mm.

Change trace_uprobe_register() to process TRACE_REG_PERF_OPEN/CLOSE
and add/del this perf_event to the list.

We can probably avoid any locking, but lets start with the "obvioulsy
correct" trace_uprobe_filter->rwlock which protects everything.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

736288ba

uprobes: Introduce uprobe_apply() · bdf8647c

由 Oleg Nesterov 提交于 2月 03, 2013

Currently it is not possible to change the filtering constraints after
uprobe_register(), so a consumer can not, say, start to trace a task/mm
which was previously filtered out, or remove the no longer needed bp's.

Introduce uprobe_apply() which simply does register_for_each_vma() again
to consult uprobe_consumer->filter() and install/remove the breakpoints.
The only complication is that register_for_each_vma() can no longer
assume that uprobe->consumers should be consulter if is_register == T,
so we change it to accept "struct uprobe_consumer *new" instead.

Unlike uprobe_register(), uprobe_apply(true) doesn't do "unregister" if
register_for_each_vma() fails, it is up to caller to handle the error.

Note: we probably need to cleanup the current interface, it is strange
that uprobe_apply/unregister need inode/offset. We should either change
uprobe_register() to return "struct uprobe *", or add a private ->uprobe
member in uprobe_consumer. And in the long term uprobe_apply() should
take a single argument, uprobe or consumer, even "bool add" should go
away.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

bdf8647c

perf: Introduce hw_perf_event->tp_target and ->tp_list · f22c1bb6

由 Oleg Nesterov 提交于 2月 02, 2013

sys_perf_event_open()->perf_init_event(event) is called before
find_get_context(event), this means that event->ctx == NULL when
class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
can't know if this event is per-task or system-wide.

This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
The patch also moves ->bp_target up so that it can overlap with the
new member, this can help the compiler to generate the better code.

trace_uprobe_register() will use it for prefiltering to avoid the
unnecessary breakpoints in mm's we do not want to trace.

->tp_target doesn't have its own reference, but we can rely on the
fact that either sys_perf_event_open() holds a reference, or it is
equal to event->ctx->task. So this pointer is always valid until
free_event().

Also add the "struct list_head tp_list" into this union. It is not
strictly necessary, but it can simplify the next changes and we can
add it for free.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

f22c1bb6

uprobes/perf: Always increment trace_uprobe->nhit · 1b47aefd

由 Oleg Nesterov 提交于 1月 31, 2013

Move tu->nhit++ from uprobe_trace_func() to uprobe_dispatcher().

->nhit counts how many time we hit the breakpoint inserted by this
uprobe, we do not want to loose this info if uprobe was enabled by
sys_perf_event_open().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

1b47aefd

uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe · a932b738

由 Oleg Nesterov 提交于 1月 31, 2013

trace_uprobe->consumer and "struct uprobe_trace_consumer" add the
unnecessary indirection and complicate the code for no reason.

This patch simply embeds uprobe_consumer into "struct trace_uprobe",
all other changes only fix the compilation errors.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

a932b738

uprobes/tracing: Introduce is_trace_uprobe_enabled() · b64b0077

由 Oleg Nesterov 提交于 1月 31, 2013

probe_event_enable/disable() check tu->consumer != NULL to avoid the
wrong uprobe_register/unregister().

We are going to kill this pointer and "struct uprobe_trace_consumer",
so we add the new helper, is_trace_uprobe_enabled(), which can rely
on TP_FLAG_TRACE/TP_FLAG_PROFILE instead.

Note: the current logic doesn't look optimal, it is not clear why
TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive, we will probably
change this later.

Also kill the unused TP_FLAG_UPROBE.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

b64b0077

uprobes/tracing: Ensure inode != NULL in create_trace_uprobe() · 7e4e28c5

由 Oleg Nesterov 提交于 1月 28, 2013

probe_event_enable/disable() check tu->inode != NULL at the start.
This is ugly, if igrab() can fail create_trace_uprobe() should not
succeed and "postpone" the failure.

And S_ISREG(inode->i_mode) check added by d24d7dbf is not safe.

Note: alloc_uprobe() should probably check igrab() != NULL as well.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

7e4e28c5

uprobes/tracing: Fully initialize uprobe_trace_consumer before uprobe_register() · 4161824f

由 Oleg Nesterov 提交于 1月 27, 2013

probe_event_enable() does uprobe_register() and only after that sets
utc->tu and tu->consumer/flags. This can race with uprobe_dispatcher()
which can miss these assignments or see them out of order. Nothing
really bad can happen, but this doesn't look clean/safe.

And this does not allow to use uprobe_consumer->filter() we are going
to add, it is called by uprobe_register() and it needs utc->tu.

Change this code to initialize everything before uprobe_register(), and
reset tu->consumer/flags if it fails. We can't race with event_disable(),
the caller holds event_mutex, and if we could the code would be wrong
anyway.

In fact I think uprobe_trace_consumer should die, it buys nothing but
complicates the code. We can simply add uprobe_consumer into trace_uprobe.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

4161824f

uprobes/tracing: Fix dentry/mount leak in create_trace_uprobe() · 84d7ed79

由 Oleg Nesterov 提交于 1月 27, 2013

create_trace_uprobe() does kern_path() to find ->d_inode, but forgets
to do path_put(). We can do this right after igrab().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>

84d7ed79

uprobes: Add exports for module use · e8440c14

由 Josh Stone 提交于 1月 13, 2013

The original pull message for uprobes (commit 654443e2) noted:

  This tree includes uprobes support in 'perf probe' - but SystemTap
  (and other tools) can take advantage of user probe points as well.

In order to actually be usable in module-based tools like SystemTap, the
interface needs to be exported.  This patch first adds the obvious
exports for uprobe_register and uprobe_unregister.  Then it also adds
one for task_user_regset_view, which is necessary to get the correct
state of userspace registers.
Signed-off-by: NJosh Stone <jistone@redhat.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

e8440c14

uprobes: Kill the bogus IS_ERR_VALUE(xol_vaddr) check · af4355e9