提交 · 2e33af029556cb8bd22bf4f86f42d540249177ea · OpenHarmony / kernel_linux

15 5月, 2010 2 次提交

tracing: Move fields from event to class structure · 2e33af02

由 Steven Rostedt 提交于 4月 22, 2010

Move the defined fields from the event to the class structure.
Since the fields of the event are defined by the class they belong
to, it makes sense to have the class hold the information instead
of the individual events. The events of the same class would just
hold duplicate information.

After this change the size of the kernel dropped another 3K:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields

Although the text increased, this was mainly due to the C files
having to adapt to the change. This is a constant increase, where
new tracepoints will not increase the Text. But the big drop is
in the data size (as well as needed allocations to hold the fields).
This will give even more savings as more tracepoints are created.

Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
with several DEFINE_EVENT()s, then the savings will be lost. But
we are pushing developers to consolidate events with DEFINE_EVENT()
so this should not be an issue.

The kprobes define a unique class to every new event, but are dynamic
so it should not be a issue.

The syscalls however have a single class but the fields for the individual
events are different. The syscalls use a metadata to define the
fields. I moved the fields list from the event to the metadata and
added a "get_fields()" function to the class. This function is used
to find the fields. For normal events and kprobes, get_fields() just
returns a pointer to the fields list_head in the class. For syscall
events, it returns the fields list_head in the metadata for the event.

v2:  Fixed the syscall fields. The syscall metadata needs a list
     of fields for both enter and exit.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2e33af02

tracing: Remove per event trace registering · 2239291a

由 Steven Rostedt 提交于 4月 21, 2010

This patch removes the register functions of TRACE_EVENT() to enable
and disable tracepoints. The registering of a event is now down
directly in the trace_events.c file. The tracepoint_probe_register()
is now called directly.

The prototypes are no longer type checked, but this should not be
an issue since the tracepoints are created automatically by the
macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
other macros will catch it.

The trace_event_class structure now holds the probes to be called
by the callbacks. This removes needing to have each event have
a separate pointer for the probe.

To handle kprobes and syscalls, since they register probes in a
different manner, a "reg" field is added to the ftrace_event_class
structure. If the "reg" field is assigned, then it will be called for
enabling and disabling of the probe for either ftrace or perf. To let
the reg function know what is happening, a new enum (trace_reg) is
created that has the type of control that is needed.

With this new rework, the 82 kernel events and 618 syscall events
has their footprint dramatically lowered:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class
4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
4900252	1057412	 861512	6819176	 680d68	vmlinux.regs

The size went from 6863829 to 6819176, that's a total of 44K
in savings. With tracepoints being continuously added, this is
critical that the footprint becomes minimal.

v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
    specific structure in trace_events.c.

v4: Fixed trace self tests to check probe because regfunc no longer
    exists.

v3: Updated to handle void *data in beginning of probe parameters.
    Also added the tracepoint: check_trace_callback_type_##call().

v2: Changed the callback probes to pass void * and typecast the
    value within the function.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2239291a

14 5月, 2010 3 次提交

tracing: Let tracepoints have data passed to tracepoint callbacks · 38516ab5

由 Steven Rostedt 提交于 4月 20, 2010

This patch adds data to be passed to tracepoint callbacks.

The created functions from DECLARE_TRACE() now need a mandatory data
parameter. For example:

DECLARE_TRACE(mytracepoint, int value, value)

Will create the register function:

int register_trace_mytracepoint((void(*)(void *data, int value))probe,
                                void *data);

As the first argument, all callbacks (probes) must take a (void *data)
parameter. So a callback for the above tracepoint will look like:

void myprobe(void *data, int value)
{
}

The callback may choose to ignore the data parameter.

This change allows callbacks to register a private data pointer along
with the function probe.

	void mycallback(void *data, int value);

	register_trace_mytracepoint(mycallback, mydata);

Then the mycallback() will receive the "mydata" as the first parameter
before the args.

A more detailed example:

  DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  /* In the C file */

  DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

  [...]

       trace_mytracepoint(status);

  /* In a file registering this tracepoint */

  int my_callback(void *data, int status)
  {
	struct my_struct my_data = data;
	[...]
  }

  [...]
	my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
	init_my_data(my_data);
	register_trace_mytracepoint(my_callback, my_data);

The same callback can also be registered to the same tracepoint as long
as the data registered is different. Note, the data must also be used
to unregister the callback:

	unregister_trace_mytracepoint(my_callback, my_data);

Because of the data parameter, tracepoints declared this way can not have
no args. That is:

  DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

will cause an error.

If no arguments are needed, a new macro can be used instead:

  DECLARE_TRACE_NOARGS(mytracepoint);

Since there are no arguments, the proto and args fields are left out.

This is part of a series to make the tracepoint footprint smaller:

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class
4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint

Again, this patch also increases the size of the kernel, but
lays the ground work for decreasing it.

 v5: Fixed net/core/drop_monitor.c to handle these updates.

 v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
     #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
     cases. The __DECLARE_TRACE() is what changes.
     Thanks to Frederic Weisbecker for pointing this out.

 v3: Made all register_* functions require data to be passed and
     all callbacks to take a void * parameter as its first argument.
     This makes the calling functions comply with C standards.

     Also added more comments to the modifications of DECLARE_TRACE().

 v2: Made the DECLARE_TRACE() have the ability to pass arguments
     and added a new DECLARE_TRACE_NOARGS() for tracepoints that
     do not need any arguments.
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

38516ab5

tracepoints: Add check trace callback type · 53da59aa

由 Mathieu Desnoyers 提交于 4月 30, 2010

This check is meant to be used by tracepoint users which do a direct cast of
callbacks to (void *) for direct registration, thus bypassing the
register_trace_##name and unregister_trace_##name checks.

This permits to ensure that the callback type matches the function type at the
call site, but without generating any code.
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20100430165959.GA25605@Krystal>
CC: Ingo Molnar <mingo@elte.hu>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

53da59aa

tracing: Create class struct for events · 8f082018

由 Steven Rostedt 提交于 4月 20, 2010

This patch creates a ftrace_event_class struct that event structs point to.
This class struct will be made to hold information to modify the
events. Currently the class struct only holds the events system name.

This patch slightly increases the size, but this change lays the ground work
of other changes to make the footprint of tracepoints smaller.

With 82 standard tracepoints, and 618 system call tracepoints
(two tracepoints per syscall: enter and exit):

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4914025	1088868	 861512	6864405	 68be15	vmlinux.class

This patch also cleans up some stale comments in ftrace.h.

v2: Fixed missing semi-colon in macro.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8f082018

11 5月, 2010 1 次提交

sched, wait: Use wrapper functions · a93d2f17

由 Changli Gao 提交于 5月 07, 2010

epoll should not touch flags in wait_queue_t. This patch introduces a new
function __add_wait_queue_exclusive(), for the users, who use wait queue as a
LIFO queue.

__add_wait_queue_tail_exclusive() is introduced too instead of
add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as
it is a duplicate of __remove_wait_queue(), disliked by users, and with less
users.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: <containers@lists.linux-foundation.org>
LKML-Reference: <1273214006-2979-1-git-send-email-xiaosuo@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a93d2f17

10 5月, 2010 2 次提交

sched: Intoduce get_cpu_iowait_time_us() · 0224cf4c

由 Arjan van de Ven 提交于 5月 09, 2010

For the ondemand cpufreq governor, it is desired that the iowait
time is microaccounted in a similar way as idle time is.

This patch introduces the infrastructure to account and expose
this information via the get_cpu_iowait_time_us() function.

[akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build]
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082523.284feab6@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0224cf4c

sched: Eliminate the ts->idle_lastupdate field · e0e37c20

由 Arjan van de Ven 提交于 5月 09, 2010

Now that the only user of ts->idle_lastupdate is
update_ts_time_stats(), the entire field can be eliminated.

In update_ts_time_stats(), idle_lastupdate is first set to
"now", and a few lines later, the only user is an if() statement
that assigns a variable either to "now" or to
ts->idle_lastupdate, which has the value of "now" at that point.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082439.2fab0b4f@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e0e37c20

08 5月, 2010 1 次提交

cpu_stop: add dummy implementation for UP · bbf1bb3e

由 Tejun Heo 提交于 5月 08, 2010

When !CONFIG_SMP, cpu_stop functions weren't defined at all which
could lead to build failures if UP code uses cpu_stop facility.  Add
dummy cpu_stop implementation for UP.  The waiting variants execute
the work function directly with preempt disabled and
stop_one_cpu_nowait() schedules a workqueue work.

Makefile and ifdefs around stop_machine implementation are updated to
accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NIngo Molnar <mingo@elte.hu>

bbf1bb3e

07 5月, 2010 4 次提交

sched: Remove rq argument to the tracepoints · 27a9da65

由 Peter Zijlstra 提交于 5月 04, 2010

struct rq isn't visible outside of sched.o so its near useless to
expose the pointer, also there are no users of it, so remove it.
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272997616.1642.207.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

27a9da65

sched: replace migration_thread with cpu_stop · 969c7921

由 Tejun Heo 提交于 5月 06, 2010

Currently migration_thread is serving three purposes - migration
pusher, context to execute active_load_balance() and forced context
switcher for expedited RCU synchronize_sched.  All three roles are
hardcoded into migration_thread() and determining which job is
scheduled is slightly messy.

This patch kills migration_thread and replaces all three uses with
cpu_stop.  The three different roles of migration_thread() are
splitted into three separate cpu_stop callbacks -
migration_cpu_stop(), active_load_balance_cpu_stop() and
synchronize_sched_expedited_cpu_stop() - and each use case now simply
asks cpu_stop to execute the callback as necessary.

synchronize_sched_expedited() was implemented with private
preallocated resources and custom multi-cpu queueing and waiting
logic, both of which are provided by cpu_stop.
synchronize_sched_expedited_count is made atomic and all other shared
resources along with the mutex are dropped.

synchronize_sched_expedited() also implemented a check to detect cases
where not all the callback got executed on their assigned cpus and
fall back to synchronize_sched().  If called with cpu hotplug blocked,
cpu_stop already guarantees that and the condition cannot happen;
otherwise, stop_machine() would break.  However, this patch preserves
the paranoid check using a cpumask to record on which cpus the stopper
ran so that it can serve as a bisection point if something actually
goes wrong theree.

Because the internal execution state is no longer visible,
rcu_expedited_torture_stats() is removed.

This patch also renames cpu_stop threads to from "stopper/%d" to
"migration/%d".  The names of these threads ultimately don't matter
and there's no reason to make unnecessary userland visible changes.

With this patch applied, stop_machine() and sched now share the same
resources.  stop_machine() is faster without wasting any resources and
sched migration users are much cleaner.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>

969c7921

stop_machine: reimplement using cpu_stop · 3fc1f1e2

由 Tejun Heo 提交于 5月 06, 2010

Reimplement stop_machine using cpu_stop.  As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.

With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler.  Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.

stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.

The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines.  According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines.  cpu_stop resources are preallocated for all online
cpus and should have the same effect.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>

3fc1f1e2

cpu_stop: implement stop_cpu[s]() · 1142d810

由 Tejun Heo 提交于 5月 06, 2010

Implement a simplistic per-cpu maximum priority cpu monopolization
mechanism.  A non-sleeping callback can be scheduled to run on one or
multiple cpus with maximum priority monopolozing those cpus.  This is
primarily to replace and unify RT workqueue usage in stop_machine and
scheduler migration_thread which currently is serving multiple
purposes.

Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
stop_cpus() and try_stop_cpus().

This is to allow clean sharing of resources among stop_cpu and all the
migration thread users.  One stopper thread per cpu is created which
is currently named "stopper/CPU".  This will eventually replace the
migration thread and take on its name.

* This facility was originally named cpuhog and lived in separate
  files but Peter Zijlstra nacked the name and thus got renamed to
  cpu_stop and moved into stop_machine.c.

* Better reporting of preemption leak as per Peter's suggestion.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>

1142d810

05 5月, 2010 1 次提交

tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header · 2e26ca71

由 Steven Rostedt 提交于 5月 05, 2010

When more than one header is included under CREATE_TRACE_POINTS
the DECLARE_TRACE() macro is not defined back to its original meaning
and the second include will fail to initialize the TRACE_EVENT()
and DECLARE_TRACE() correctly.

To fix this the tracepoint.h file moves the define of DECLARE_TRACE()
out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the
define of the TRACE_EVENT()). This way the define_trace.h will undef
the DECLARE_TRACE() at the end and allow new headers to start
from scratch.

This patch also requires fixing the include/events/napi.h

It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT()
format. But I'll leave that change to the authors of that file.
But since the napi.h file depends on using the CREATE_TRACE_POINTS
and does not define its own DEFINE_TRACE() it must use the define_trace.h
method instead.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2e26ca71

04 5月, 2010 1 次提交

tracing: Convert nop macros to static inlines · 4dbf6bc2

由 Steven Rostedt 提交于 5月 04, 2010

The ftrace.h file contains several functions as macros when the
functions are disabled due to config options. This patch converts
most of them to static inlines.

There are two exceptions:

  register_ftrace_function() and unregister_ftrace_function()

This is because their parameter "ops" must not be evaluated since
code using the function is allowed to #ifdef out the creation of
the parameter.

This also fixes an error caused by recent changes:

 kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
 kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4dbf6bc2

29 4月, 2010 2 次提交

sctp: Fix oops when sending queued ASCONF chunks · c0786693

由 Vlad Yasevich 提交于 4月 28, 2010

When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF.  This action runs the sctp state
machine recursively and it's not prepared to do so.

kernel BUG at kernel/timer.c:790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
EIP is at add_timer+0xd/0x1b
EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
Stack:
 c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
<0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
00000004
<0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
000000d0
Call Trace:
 [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
 [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
 [<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
 [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
 [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
 [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
 [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
 [<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
 [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
 [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
 [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]
Tested-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NYuansong Qiao <ysqiao@research.ait.ie>
Signed-off-by: NShuaijun Zhang <szhang@research.ait.ie>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0786693

sctp: avoid irq lock inversion while call sk->sk_data_ready() · 561b1733

由 Wei Yongjun 提交于 4月 28, 2010

sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
 (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
 (slock-AF_INET){+.-...}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
1 lock held by sctp_darn/1517:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

561b1733

28 4月, 2010 3 次提交

coda: move backing-dev.h kernel include inside __KERNEL__ · 33f60e96

由 Jens Axboe 提交于 4月 28, 2010

Otherwise we must export backing-dev.h as well, which doesn't make
any sense.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

33f60e96

ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd

由 David Miller 提交于 4月 20, 2010

When performing a non-consuming read, a synchronize_sched() is
performed once for every cpu which is actively tracing.

This is very expensive, and can make it take several seconds to open
up the 'trace' file with lots of cpus.

Only one synchronize_sched() call is actually necessary.  What is
desired is for all cpus to see the disabling state change.  So we
transform the existing sequence:

	for_each_cpu() {
		ring_buffer_read_start();
	}

where each ring_buffer_start() call performs a synchronize_sched(),
into the following:

	for_each_cpu() {
		ring_buffer_read_prepare();
	}
	ring_buffer_read_prepare_sync();
	for_each_cpu() {
		ring_buffer_read_start();
	}

wherein only the single ring_buffer_read_prepare_sync() call needs to
do the synchronize_sched().

The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
memory and increments ->record_disabled.

In the second phase, ring_buffer_read_prepare_sync() makes sure this
->record_disabled state is visible fully to all cpus.

And in the final third phase, the ring_buffer_read_start() calls reset
the 'iter' objects allocated in the first phase since we now know that
none of the cpus are adding trace entries any more.

This makes openning the 'trace' file nearly instantaneous on a
sparc64 Niagara2 box with 128 cpus tracing.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

72c9ddfd

tracing: Add graph output support for irqsoff tracer · 62b915f1

由 Jiri Olsa 提交于 4月 02, 2010

Add function graph output to irqsoff tracer.

The graph output is enabled by setting new 'display-graph' trace option.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

62b915f1

25 4月, 2010 2 次提交

Catch filesystems lacking s_bdi · 5129a469

由 Jörn Engel 提交于 4月 25, 2010

noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.
Signed-off-by: NJoern Engel <joern@logfs.org>

Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
to the noop_backing_dev_info is not legal and will not result in
them being flushed, but we already catch this condition in
__mark_inode_dirty() when checking for a registered bdi.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5129a469

hugetlb: fix infinite loop in get_futex_key() when backed by huge pages · 23be7468

由 Mel Gorman 提交于 4月 23, 2010

If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.

See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.

This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE.  This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex.  No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.

This patch fixes the problem with the test case described in the bugzilla.

[akpm@linux-foundation.org: mel cant spel]
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NDarren Hart <darren@dvhart.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23be7468

24 4月, 2010 1 次提交

Cleanup generic block based fiemap · 3a3076f4

由 Josef Bacik 提交于 4月 23, 2010

This cleans up a few of the complaints of __generic_block_fiemap. I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned. It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a3076f4

23 4月, 2010 2 次提交

sched: Pre-compute cpumask_weight(sched_domain_span(sd)) · 669c55e9

由 Peter Zijlstra 提交于 4月 16, 2010

Dave reported that his large SPARC machines spend lots of time in
hweight64(), try and optimize some of those needless cpumask_weight()
invocations (esp. with the large offstack cpumasks these are very
expensive indeed).
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

669c55e9

NFS: Fix an unstable write data integrity race · 71d0a611

由 Trond Myklebust 提交于 4月 22, 2010

Commit 2c61be0a (NFS: Ensure that the WRITE
and COMMIT RPC calls are always uninterruptible) exposed a race on file
close. In order to ensure correct close-to-open behaviour, we want to wait
for all outstanding background commit operations to complete.

This patch adds an inode flag that indicates if a commit operation is under
way, and provides a mechanism to allow ->write_inode() to wait for its
completion if this is a data integrity flush.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

71d0a611

22 4月, 2010 5 次提交

smbfs: add bdi backing to mount session · 424264b7

由 Jens Axboe 提交于 4月 22, 2010

This ensures that dirty data gets flushed properly.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

424264b7

ncpfs: add bdi backing to mount session · f1970c73

由 Jens Axboe 提交于 4月 22, 2010

This ensures that dirty data gets flushed properly.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f1970c73

coda: add bdi backing to mount session · 5163d900

由 Jens Axboe 提交于 4月 22, 2010

This ensures that dirty data gets flushed properly.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5163d900

bdi: add helper function for doing init and register of a bdi for a file system · c3c53206

由 Jens Axboe 提交于 4月 22, 2010

Pretty trivial helper, just sets up the bdi and registers it. An atomic
sequence count is used to ensure that the registered sysfs names are
unique.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c3c53206

tracing: Dump either the oops's cpu source or all cpus buffers · cecbca96

由 Frederic Weisbecker 提交于 4月 18, 2010

The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.

It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.

Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.

This patch modifies ftrace_dump_on_oops to handle this choice.

The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.

Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.

v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>

cecbca96

21 4月, 2010 1 次提交

pcmcia: pcmcia_dev_present bugfix · 04de0816

由 Dominik Brodowski 提交于 4月 20, 2010

pcmcia_dev_present is in and by itself buggy. Add a note specifying
why it is broken, and replace the broken locking -- taking a mutex
is a bad idea in IRQ context, from which this function is rarely
called -- by an atomic_t.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

04de0816

20 4月, 2010 2 次提交

KVM: Increase NR_IOBUS_DEVS limit to 200 · e80e2a60

由 Sridhar Samudrala 提交于 3月 30, 2010

This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.
Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e80e2a60

KVM: fix the handling of dirty bitmaps to avoid overflows · 87bf6e7d

由 Takuya Yoshikawa 提交于 4月 12, 2010

Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
  __set_bit() takes the offset as int, not long.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

87bf6e7d

19 4月, 2010 3 次提交

regulator: Let drivers know when they use the stub API · be1a50d4

由 Jean Delvare 提交于 4月 03, 2010

Have the stub variant of regulator_get() return NULL, so that drivers
can (but still don't have to) handle this case specifically.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Jerome Oufella <jerome.oufella@savoirfairelinux.com>
Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>

be1a50d4

drm/radeon/kms: add FireMV 2400 PCI ID. · 79b9517a

由 Dave Airlie 提交于 4月 19, 2010

This is an M24/X600 chip.

From RH# 581927

cc: stable@kernel.org
Signed-off-by: NDave Airlie <airlied@redhat.com>

79b9517a

rcu: Make RCU lockdep check the lockdep_recursion variable · bc293d62

由 Paul E. McKenney 提交于 4月 15, 2010

The lockdep facility temporarily disables lockdep checking by
incrementing the current->lockdep_recursion variable.  Such
disabling happens in NMIs and in other situations where lockdep
might expect to recurse on itself.

This patch therefore checks current->lockdep_recursion, disabling RCU
lockdep splats when this variable is non-zero.  In addition, this patch
removes the "likely()", as suggested by Lai Jiangshan.
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Reported-by: NDavid Miller <davem@davemloft.net>
Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bc293d62

16 4月, 2010 1 次提交

firewire: cdev: fix cut+paste mistake in disclaimer · a2612cb1

由 Stefan Richter 提交于 4月 15, 2010

This was supposed to be generic "authors or copyright holders";
I mistakenly picked up text from a wrong file.
Reported-by: NDaniel K. <dk@uw.no>
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>

a2612cb1

15 4月, 2010 1 次提交

firewire: cdev: change license of exported header files to MIT license · 19b3eecc

由 Stefan Richter 提交于 4月 11, 2010

Among else, this allows projects like libdc1394 to carry copies of the
ABI related header files without them or distributors having to worry
about effects on the project's overall license terms.  Switch to MIT
license as suggested by Kristian.  Also update the year in the
copyright statement according to source history.

Cc: Jay Fenlason <fenlason@redhat.com>
Acked-by: NClemens Ladisch <clemens@ladisch.de>
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: NKristian Høgsberg <krh@bitplanet.net>

19b3eecc

14 4月, 2010 2 次提交

rcu: Better explain the condition parameter of rcu_dereference_check() · c08c68dd

由 David Howells 提交于 4月 09, 2010

Better explain the condition parameter of
rcu_dereference_check() that describes the conditions under
which the dereference is permitted to take place (and
incorporate Yong Zhang's suggestion).  This condition is only
checked under lockdep proving.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: eric.dumazet@gmail.com
LKML-Reference: <1270852752-25278-2-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c08c68dd

rcu: Add rcu_access_pointer and rcu_dereference_protected · b62730ba

由 Paul E. McKenney 提交于 4月 09, 2010

This patch adds variants of rcu_dereference() that handle
situations where the RCU-protected data structure cannot change,
perhaps due to our holding the update-side lock, or where the
RCU-protected pointer is only to be fetched, not dereferenced.
These are needed due to some performance concerns with using
rcu_dereference() where it is not required, aside from the need
for lockdep/sparse checking.

The new rcu_access_pointer() primitive is for the case where the
pointer is be fetch and not dereferenced.  This primitive may be
used without protection, RCU or otherwise, due to the fact that
it uses ACCESS_ONCE().

The new rcu_dereference_protected() primitive is for the case
where updates are prevented, for example, due to holding the
update-side lock.  This primitive does neither ACCESS_ONCE() nor
smp_read_barrier_depends(), so can only be used when updates are
somehow prevented.
Suggested-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <1270852752-25278-1-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b62730ba

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年