提交 · f201ae2356c74bcae130b2177b3dca903ea98071 · openeuler / Kernel

23 11月, 2008 1 次提交

tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically · f201ae23

由 Frederic Weisbecker 提交于 11月 23, 2008

Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f201ae23

20 11月, 2008 2 次提交

cpuset: update top cpuset's mems after adding a node · f481891f

由 Miao Xie 提交于 11月 19, 2008

After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

  cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes.  It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY.  So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it.  And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paul Menage <menage@google.com
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f481891f

reintroduce accept4 · de11defe

由 Ulrich Drepper 提交于 11月 19, 2008

Introduce a new accept4() system call.  The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.

The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument.  Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)

SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4().  This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec().  More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).

The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.

Here's a test program.  Works on x86-32.  Should work on x86-64, but
I (mtk) don't have a system to hand to test with.

It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().

I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.

/* test_accept4.c

  Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
       <mtk.manpages@gmail.com>

  Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>

#define PORT_NUM 33333

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

/**********************************************************************/

/* The following is what we need until glibc gets a wrapper for
  accept4() */

/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC    O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK   O_NONBLOCK
#endif

#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif

static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
   printf("Calling accept4(): flags = %x", flags);
   if (flags != 0) {
       printf(" (");
       if (flags & SOCK_CLOEXEC)
           printf("SOCK_CLOEXEC");
       if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
           printf(" ");
       if (flags & SOCK_NONBLOCK)
           printf("SOCK_NONBLOCK");
       printf(")");
   }
   printf("\n");

#if USE_SOCKETCALL
   long args[6];

   args[0] = fd;
   args[1] = (long) sockaddr;
   args[2] = (long) addrlen;
   args[3] = flags;

   return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
   return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}

/**********************************************************************/

static int
do_test(int lfd, struct sockaddr_in *conn_addr,
       int closeonexec_flag, int nonblock_flag)
{
   int connfd, acceptfd;
   int fdf, flf, fdf_pass, flf_pass;
   struct sockaddr_in claddr;
   socklen_t addrlen;

   printf("=======================================\n");

   connfd = socket(AF_INET, SOCK_STREAM, 0);
   if (connfd == -1)
       die("socket");
   if (connect(connfd, (struct sockaddr *) conn_addr,
               sizeof(struct sockaddr_in)) == -1)
       die("connect");

   addrlen = sizeof(struct sockaddr_in);
   acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                      closeonexec_flag | nonblock_flag);
   if (acceptfd == -1) {
       perror("accept4()");
       close(connfd);
       return 0;
   }

   fdf = fcntl(acceptfd, F_GETFD);
   if (fdf == -1)
       die("fcntl:F_GETFD");
   fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
              ((closeonexec_flag & SOCK_CLOEXEC) != 0);
   printf("Close-on-exec flag is %sset (%s); ",
           (fdf & FD_CLOEXEC) ? "" : "not ",
           fdf_pass ? "OK" : "failed");

   flf = fcntl(acceptfd, F_GETFL);
   if (flf == -1)
       die("fcntl:F_GETFD");
   flf_pass = ((flf & O_NONBLOCK) != 0) ==
              ((nonblock_flag & SOCK_NONBLOCK) !=0);
   printf("nonblock flag is %sset (%s)\n",
           (flf & O_NONBLOCK) ? "" : "not ",
           flf_pass ? "OK" : "failed");

   close(acceptfd);
   close(connfd);

   printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
   return fdf_pass && flf_pass;
}

static int
create_listening_socket(int port_num)
{
   struct sockaddr_in svaddr;
   int lfd;
   int optval;

   memset(&svaddr, 0, sizeof(struct sockaddr_in));
   svaddr.sin_family = AF_INET;
   svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
   svaddr.sin_port = htons(port_num);

   lfd = socket(AF_INET, SOCK_STREAM, 0);
   if (lfd == -1)
       die("socket");

   optval = 1;
   if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                  sizeof(optval)) == -1)
       die("setsockopt");

   if (bind(lfd, (struct sockaddr *) &svaddr,
            sizeof(struct sockaddr_in)) == -1)
       die("bind");

   if (listen(lfd, 5) == -1)
       die("listen");

   return lfd;
}

int
main(int argc, char *argv[])
{
   struct sockaddr_in conn_addr;
   int lfd;
   int port_num;
   int passed;

   passed = 1;

   port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;

   memset(&conn_addr, 0, sizeof(struct sockaddr_in));
   conn_addr.sin_family = AF_INET;
   conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
   conn_addr.sin_port = htons(port_num);

   lfd = create_listening_socket(port_num);

   if (!do_test(lfd, &conn_addr, 0, 0))
       passed = 0;
   if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
       passed = 0;
   if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
       passed = 0;
   if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
       passed = 0;

   close(lfd);

   exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}

[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: NUlrich Drepper <drepper@redhat.com>
Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de11defe

18 11月, 2008 2 次提交

block: make add_partition() return pointer to hd_struct · ba32929a

由 Tejun Heo 提交于 11月 10, 2008

Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure.  This change will be used to fix md
autodetection bug.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ba32929a

tracing/function-return-tracer: add the overrun field · 0231022c

由 Frederic Weisbecker 提交于 11月 17, 2008

Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0231022c

16 11月, 2008 12 次提交

tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: API *CHANGE*. Must update all tracepoint users.

Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
structure in a single spot for all the kernel. It helps reducing memory
consumption, especially when declaring a lot of tracepoints, e.g. for
kmalloc tracing.

*API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
to do it. The name previously used was misleading.

Updates scheduler instrumentation to follow this API change.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7e066fb8

tracepoints: do not put arguments in name · 5f382671

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: cleanup

That's overkill, takes space. We have a global tracepoint registery in
header files anyway.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5f382671

tracepoints: use unregister return value · c420970e

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: bugfix.

Unregistering a tracepoint can fail. Return the error value.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c420970e

tracepoints: use rcu_*_sched_notrace · da7b3eab

由 Mathieu Desnoyers 提交于 11月 14, 2008

Make sure tracepoints can be called within ftrace callbacks.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

da7b3eab

markers: create DEFINE_MARKER and GET_MARKER (new API) · a0bca6a5

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: new API.

Allow markers to be used only for declaration, without function call
associated. Useful to create specialized probes.

The problem we had is that two function calls were required when one
wanted to put a marker in a tracepoint probe. Now the marker can be used
simply for trace data type declaration, leaving the trace write work
within the tracepoint probe without any additional function call.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a0bca6a5

markers: auto enable tracepoints (new API : trace_mark_tp()) · c1df1bd2

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: new API

Add a new API trace_mark_tp(), which declares a marker within a
tracepoint probe. When the marker is activated, the tracepoint is
automatically enabled.

No branch test is used at the marker site, because it would be a
duplicate of the branch already present in the tracepoint.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c1df1bd2

markers: add missing stdargs.h include, needed due to va_list usage · e3f8c4b9

由 Arnaldo Carvalho de Melo 提交于 11月 14, 2008

Impact: build fix (for future changes)

That seemed to cause built issue when marker.h is included early, even
though stdargs.h is included in kernel.h.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e3f8c4b9

rcu: add rcu_read_*_sched_notrace() · 954e100d

由 Mathieu Desnoyers 提交于 11月 14, 2008

Impact: new API, useful for tracepoints and markers.

Add _notrace version to rcu_read_*_sched().
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Reviewed-by: NPaul E McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

954e100d

tracing/function-return-tracer: support for dynamic ftrace on function return tracer · e7d3737e

由 Frederic Weisbecker 提交于 11月 16, 2008

This patch adds the support for dynamic tracing on the function return tracer.
The whole difference with normal dynamic function tracing is that we don't need
to hook on a particular callback. The only pro that we want is to nop or set
dynamically the calls to ftrace_caller (which is ftrace_return_caller here).

Some security checks ensure that we are not trying to launch dynamic tracing for
return tracing while normal function tracing is already running.

An example of trace with getnstimeofday set as a filter:

ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e7d3737e

ftrace: pass module struct to arch dynamic ftrace functions · 31e88909

由 Steven Rostedt 提交于 11月 14, 2008

Impact: allow archs more flexibility on dynamic ftrace implementations

Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.

Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.

The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.

This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:

  ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
       a nop, where the original text is calling addr. (mod is the
       module struct if called by module init)

  ftrace_make_caller(rec, addr) - convert the code rec->ip that should
       be a nop into a caller to addr.

The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

31e88909

Fix inotify watch removal/umount races · 8f7b0ba1

由 Al Viro 提交于 11月 15, 2008

Inotify watch removals suck violently.

To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex.  That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount.  We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.

Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done.  Cleanup is just deactivate_super().

However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords.  That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.

We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e.  that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().

So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong.  We had
to drop ih->mutex before we could grab ->s_umount.  So the watch
could've been gone already.

That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer.  If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address.  That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that.  Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.

So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check.  If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.

And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NGreg KH <greg@kroah.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8f7b0ba1

Add 'pr_fmt()' format modifier to pr_xyz macros. · d091c2f5

由 Martin Schwidefsky 提交于 11月 12, 2008

A common reason for device drivers to implement their own printk macros
is the lack of a printk prefix with the standard pr_xyz macros.
Introduce a pr_fmt() macro that is applied for every pr_xyz macro to the
format string.

The most common use of the pr_fmt macro would be to add the name of the
device driver to all pr_xyz messages in a source file.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d091c2f5

14 11月, 2008 3 次提交

lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.c · e8f6fbf6

由 Ingo Molnar 提交于 11月 12, 2008

fix this warning:

  net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
  net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used

this is a lockdep macro problem in the !LOCKDEP case.

We cannot convert it to an inline because the macro works on multiple types,
but we can mark the parameter used.

[ also clean up a misaligned tab in sock_lock_init_class_and_name() ]

[ also remove #ifdefs from around af_family_clock_key strings - which
  were certainly added to get rid of the ugly build warnings. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8f6fbf6

USB: don't register endpoints for interfaces that are going away · 352d0263

由 Alan Stern 提交于 10月 29, 2008

This patch (as1155) fixes a bug in usbcore.  When interfaces are
deleted, either because the device was disconnected or because of a
configuration change, the extra attribute files and child endpoint
devices may get left behind.  This is because the core removes them
before calling device_del().  But during device_del(), after the
driver is unbound the core will reinstall altsetting 0 and recreate
those extra attributes and children.

The patch prevents this by adding a flag to record when the interface
is in the midst of being unregistered.  When the flag is set, the
attribute files and child devices will not be created.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@kernel.org> [2.6.27, 2.6.26, 2.6.25]
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

352d0263

slab: document SLAB_DESTROY_BY_RCU · d7de4c1d

由 Peter Zijlstra 提交于 11月 13, 2008

Explain this SLAB_DESTROY_BY_RCU thing...

[hugh@veritas.com: add a pointer to comment in mm/slab.c]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

d7de4c1d

13 11月, 2008 5 次提交

HID: map macbook keys for "Expose" and "Dashboard" · 437184ae

由 Henrik Rydberg 提交于 11月 04, 2008

On macbooks there are specific keys for the user-space functions Expose
and Dashboard, which currently has no counterpart in input.h. This patch
adds KEY_SCALE and KEY_DASHBOARD, and maps the keyboard accordingly.
Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: NHenrik Rydberg <rydberg@euromail.se>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

437184ae

Add c2 port support · 4e17e1db

由 Rodolfo Giometti 提交于 11月 12, 2008

C2port implements a two wire serial communication protocol (bit
banging) designed to enable in-system programming, debugging, and
boundary-scan testing on low pin-count Silicon Labs devices.

Currently this code supports only flash programming through sysfs
interface but extensions shoud be easy to add.
Signed-off-by: NRodolfo Giometti <giometti@linux.it>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e17e1db

rtc: rtc-wm8350: add support for WM8350 RTC · 077eaf5b

由 Mark Brown 提交于 11月 12, 2008

This adds support for the RTC provided by the Wolfson Microelectronics
WM8350.

This driver was originally written by Graeme Gregory and Liam Girdwood,
though it has been modified since then to update it to current mainline
coding standards and for API completeness.

[akpm@linux-foundation.org: s/schedule_timeout_interruptible/schedule_timeout_uninterruptible/ to prevent bogus timeout when signal_pending()]
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Liam Girdwood <linux@wolfsonmicro.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

077eaf5b

remove ratelimt() · b76f90b5

由 Andrew Morton 提交于 11月 12, 2008

It mistakenly assumes that a static local in an inlined function is a
kernel-wide singleton.  It also has no callers, so let's remove it.

Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b76f90b5

trace: rename unlikely profiler to branch profiler · 2ed84eeb

由 Steven Rostedt 提交于 11月 12, 2008

Impact: name change of unlikely tracer and profiler

Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2ed84eeb

12 11月, 2008 7 次提交

tracing: branch tracer, fix vdso crash · 2b7d0390

由 Ingo Molnar 提交于 11月 12, 2008

Impact: fix bootup crash

the branch tracer missed arch/x86/vdso/vclock_gettime.c from
disabling tracing, which caused such bootup crashes:

  [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077

also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
instrumentation on a per file basis.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2b7d0390

tracing: profile likely and unlikely annotations · 1f0d69a9

由 Steven Rostedt 提交于 11月 12, 2008

Impact: new unlikely/likely profiler

Andrew Morton recently suggested having an in-kernel way to profile
likely and unlikely macros. This patch achieves that goal.

When configured, every(*) likely and unlikely macro gets a counter attached
to it. When the condition is hit, the hit and misses of that condition
are recorded. These numbers can later be retrieved by:

  /debugfs/tracing/profile_likely    - All likely markers
  /debugfs/tracing/profile_unlikely  - All unlikely markers.

# cat /debug/tracing/profile_unlikely | head
 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
    2167        0   0 do_arch_prctl                  process_64.c         832
       0        0   0 do_arch_prctl                  process_64.c         804
    2670        0   0 IS_ERR                         err.h                34
   71230     5693   7 __switch_to                    process_64.c         673
   76919        0   0 __switch_to                    process_64.c         639
   43184    33743  43 __switch_to                    process_64.c         624
   12740    64181  83 __switch_to                    process_64.c         594
   12740    64174  83 __switch_to                    process_64.c         590

# cat /debug/tracing/profile_unlikely | \
  awk '{ if ($3 > 25) print $0; }' |head -20
   44963    35259  43 __switch_to                    process_64.c         624
   12762    67454  84 __switch_to                    process_64.c         594
   12762    67447  84 __switch_to                    process_64.c         590
    1478      595  28 syscall_get_error              syscall.h            51
       0     2821 100 syscall_trace_leave            ptrace.c             1567
       0        1 100 native_smp_prepare_cpus        smpboot.c            1237
   86338   265881  75 calc_delta_fair                sched_fair.c         408
  210410   108540  34 calc_delta_mine                sched.c              1267
       0    54550 100 sched_info_queued              sched_stats.h        222
   51899    66435  56 pick_next_task_fair            sched_fair.c         1422
       6       10  62 yield_task_fair                sched_fair.c         982
    7325     2692  26 rt_policy                      sched.c              144
       0     1270 100 pre_schedule_rt                sched_rt.c           1261
    1268    48073  97 pick_next_task_rt              sched_rt.c           884
       0    45181 100 sched_info_dequeued            sched_stats.h        177
       0       15 100 sched_move_task                sched.c              8700
       0       15 100 sched_move_task                sched.c              8690
   53167    33217  38 schedule                       sched.c              4457
       0    80208 100 sched_info_switch              sched_stats.h        270
   30585    49631  61 context_switch                 sched.c              2619

# cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
   39900    36577  47 pick_next_task                 sched.c              4397
   20824    15233  42 switch_mm                      mmu_context_64.h     18
       0        7 100 __cancel_work_timer            workqueue.c          560
     617    66484  99 clocksource_adjust             timekeeping.c        456
       0   346340 100 audit_syscall_exit             auditsc.c            1570
      38   347350  99 audit_get_context              auditsc.c            732
       0   345244 100 audit_syscall_entry            auditsc.c            1541
      38     1017  96 audit_free                     auditsc.c            1446
       0     1090 100 audit_alloc                    auditsc.c            862
    2618     1090  29 audit_alloc                    auditsc.c            858
       0        6 100 move_masked_irq                migration.c          9
       1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
       2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
       0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
    4514     2090  31 __grab_cache_page              filemap.c            2149
   12882   228786  94 mapping_unevictable            pagemap.h            50
       4       11  73 __flush_cpu_slab               slub.c               1466
  627757   330451  34 slab_free                      slub.c               1731
    2959    61245  95 dentry_lru_del_init            dcache.c             153
     946     1217  56 load_elf_binary                binfmt_elf.c         904
     102       82  44 disk_put_part                  genhd.h              206
       1        1  50 dst_gc_task                    dst.c                82
       0       19 100 tcp_mss_split_point            tcp_output.c         1126

As you can see by the above, there's a bit of work to do in rethinking
the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
that were more than 25%.

Note:  After submitting my first version of this patch, Andrew Morton
  showed me a version written by Daniel Walker, where I picked up
  the following ideas from:

  1)  Using __builtin_constant_p to avoid profiling fixed values.
  2)  Using __FILE__ instead of instruction pointers.
  3)  Using the preprocessor to stop all profiling of likely
       annotations from vsyscall_64.c.

Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
for their feed back on this patch.

(*) Not ever unlikely is recorded, those that are used by vsyscalls
 (a few of them) had to have profiling disabled.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1f0d69a9

tracing/fastboot: move boot tracer structs and funcs into their own header. · 3f5ec136

由 Frederic Weisbecker 提交于 11月 11, 2008

Impact: Cleanups on the boot tracer and ftrace

This patch bring some cleanups about the boot tracer headers. The
functions and structures of this tracer have nothing related to ftrace
and should have so their own header file.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f5ec136

hrtimer: clean up unused callback modes · 621a0d52

由 Peter Zijlstra 提交于 11月 12, 2008

Impact: cleanup

git grep HRTIMER_CB_IRQSAFE revealed half the callback modes are actually
unused.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

621a0d52

serial: sh-sci: fix cannot work SH7723 SCIFA · 1a22f08d

由 Yoshihiro Shimoda 提交于 11月 11, 2008

SH7723 has SCIFA. This module is similer SCI register map, but it has FIFO.
So this patch adds new type(PORT_SCIFA) and change some type checking.
Signed-off-by: NYoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

1a22f08d

ring-buffer: buffer record on/off switch · a3583244

由 Steven Rostedt 提交于 11月 11, 2008

Impact: enable/disable ring buffer recording API added

Several kernel developers have requested that there be a way to stop
recording into the ring buffers with a simple switch that can also
be enabled from userspace. This patch addes a new kernel API to the
ring buffers called:

 tracing_on()
 tracing_off()

When tracing_off() is called, all ring buffers will not be able to record
into their buffers.

tracing_on() will enable the ring buffers again.

These two act like an on/off switch. That is, there is no counting of the
number of times tracing_off or tracing_on has been called.

A new file is added to the debugfs/tracing directory called

  tracing_on

This allows for userspace applications to also flip the switch.

  echo 0 > debugfs/tracing/tracing_on

disables the tracing.

  echo 1 > /debugfs/tracing/tracing_on

enables it.

Note, this does not disable or enable any tracers. It only sets or clears
a flag that needs to be set in order for the ring buffers to write to
their buffers. It is a global flag, and affects all ring buffers.

The buffers start out with tracing_on enabled.

There are now three flags that control recording into the buffers:

 tracing_on: which affects all ring buffer tracers.

 buffer->record_disabled: which affects an allocated buffer, which may be set
     if an anomaly is detected, and tracing is disabled.

 cpu_buffer->record_disabled: which is set by tracing_stop() or if an
     anomaly is detected. tracing_start can not reenable this if
     an anomaly occurred.

The userspace debugfs/tracing/tracing_enabled is implemented with
tracing_stop() but the user space code can not enable it if the kernel
called tracing_stop().

Userspace can enable the tracing_on even if the kernel disabled it.
It is just a switch used to stop tracing if a condition was hit.
tracing_on is not for protecting critical areas in the kernel nor is
it for stopping tracing if an anomaly occurred. This is because userspace
can reenable it at any time.

Side effect: With this patch, I discovered a dead variable in ftrace.c
  called tracing_on. This patch removes it.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

a3583244

telephony: trivial: fix up email address · 0906dd9d

由 Alan Cox 提交于 11月 11, 2008

Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0906dd9d

11 11月, 2008 4 次提交

tracing, x86: add low level support for ftrace return tracing · caf4b323

由 Frederic Weisbecker 提交于 11月 11, 2008

Impact: add infrastructure for function-return tracing

Add low level support for ftrace return tracing.

This plug-in stores return addresses on the thread_info structure of
the current task.

The index of the current return address is initialized when the task
is the first one (init) and when a process forks (the child). It is
not needed when a task does a sys_execve because after this syscall,
it still needs to return on the kernel functions it called.

Note that the code of return_to_handler has been suggested by Steven
Rostedt as almost all of the ideas of improvements in this V3.

For purpose of security, arch/x86/kernel/process_32.c is not traced
because __switch_to() changes the current task during its execution.
That could cause inconsistency in the stored return address of this
function even if I didn't have any crash after testing with tracing on
this function enabled.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

caf4b323

fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock · ad474cac

由 Oleg Nesterov 提交于 11月 10, 2008

Impact: fix hang/crash on ia64 under high load

This is ugly, but the simplest patch by far.

Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check ->signal != NULL.

Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
before __cleanup_signal() to make sure ->signal can't be freed under
task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.

Thanks to Ingo who nacked my previous buggy patch.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Reported-by: NDoug Chapman <doug.chapman@hp.com>

ad474cac

ssb: Fix DMA-API compilation for non-PCI systems · fd0fcf5c

由 Michael Buesch 提交于 11月 06, 2008

This fixes compilation of the SSB DMA-API code on non-PCI platforms.
Signed-off-by: NMichael Buesch <mb@bu3sch.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd0fcf5c

libata: revert convert-to-block-tagging patches · 8a8bc223

由 Tejun Heo 提交于 11月 10, 2008

This patch reverts the following three commits which convert libata to
use block layer tagging.

 43a49cbd
 e013e13b
 2fca5ccf

Although using block layer tagging is the right direction, due to the
tight coupling among tag number, data structure allocation and
hardware command slot allocation, libata doesn't work correctly with
the current conversion.

The biggest problem is guaranteeing that tag 0 is always used for
non-NCQ commands.  Due to the way blk-tag is implemented and how SCSI
starts and finishes requests, such guarantee can't be made.  I'm not
sure whether this would actually break any low level driver but it
doesn't look like a good idea to break such assumption given the
frailty of ATA controllers.

So, for the time being, keep using the old dumb in-libata qc
allocation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jens Axobe <jens.axboe@oracle.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a8bc223

10 11月, 2008 2 次提交

cpumask: introduce new API, without changing anything, v3 · 984f2f37

由 Rusty Russell 提交于 11月 08, 2008

Impact: cleanup

Clean up based on feedback from Andrew Morton and others:

 - change to inline functions instead of macros
 - add __init to bootmem method
 - add a missing debug check
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

984f2f37

clarify usage expectations for cnt32_to_63() · 058e3739

由 Nicolas Pitre 提交于 11月 09, 2008

Currently, all existing users of cnt32_to_63() are fine since the CPU
architectures where it is used don't do read access reordering, and user
mode preemption is disabled already.  It is nevertheless a good idea to
better elaborate usage requirements wrt preemption, and use an explicit
memory barrier on SMP to avoid different CPUs accessing the counter
value in the wrong order.  On UP a simple compiler barrier is
sufficient.
Signed-off-by: NNicolas Pitre <nico@marvell.com>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

058e3739

09 11月, 2008 1 次提交

mmc: struct device - replace bus_id with dev_name(), dev_set_name() · d1b26863

由 Kay Sievers 提交于 11月 08, 2008

Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NPierre Ossman <drzeus@drzeus.cx>

d1b26863

08 11月, 2008 1 次提交

ACPI video: if no ACPI backlight support, use vendor drivers · c3d6de69

由 Thomas Renninger 提交于 8月 01, 2008

If an ACPI graphics device supports backlight brightness functions (cmp. with
latest ACPI spec Appendix B), let the ACPI video driver control backlight and
switch backlight control off in vendor specific ACPI drivers (asus_acpi,
thinkpad_acpi, eeepc, fujitsu_laptop, msi_laptop, sony_laptop, acer-wmi).

Currently it is possible to load above drivers and let both poke on the
brightness HW registers, the video and vendor specific ACPI drivers -> bad.

This patch provides the basic support to check for BIOS capabilities before
driver loading time. Driver specific modifications are in separate follow up
patches.

"acpi_backlight=vendor"
	Prever vendor driver over ACPI driver for backlight.
"acpi_backlight=video" (default)
	Prever ACPI driver over vendor driver for backlight.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Acked-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

c3d6de69

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功