提交 · 56992309ccbe71f4321ddd50ee2f76f91b412c1a · openeuler / raspberrypi-kernel

12 11月, 2009 4 次提交

sysctl kernel: Remove binary sysctl logic · 56992309

由 Eric W. Biederman 提交于 11月 05, 2009

Now that sys_sysctl is a generic wrapper around /proc/sys  .ctl_name
and .strategy members of sysctl tables are dead code.  Remove them.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

56992309

sysctl binary: Reorder the tests to process wild card entries first. · 757010f0

由 Eric W. Biederman 提交于 11月 12, 2009

A malicious user could have passed in a ctl_name of 0 and triggered
the well know ctl_name to procname mapping code, instead of the wild
card matching code. This is a slight problem as wild card entries don't
have procnames, and because in some alternate universe a network device
might have ifindex 0. So test for and handle wild card entries first.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

757010f0

sysctl: sysctl_binary.c Fix compilation when !CONFIG_NET · 63395b65

由 Eric W. Biederman 提交于 11月 12, 2009

dev_get_by_index does not exist when the network stack is not
compiled in, so only include the code to follow wild card paths
when the network stack is present.

I have shuffled the code around a little to make it clear
that dev_put is called after dev_get_by_index showing that
there is no leak.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

63395b65

sysctl: Warn about all uses of sys_sysctl. · 2fb10732

由 Eric W. Biederman 提交于 11月 11, 2009

Now that the glibc pthread implemenation no longers uses sysctl() users
of sysctl are as rare as hen's teeth.  So remove the glibc exception
from the warning, and use the standard printk_ratelimit instead of
rolling our own.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

2fb10732

11 11月, 2009 5 次提交

sysctl: Don't look at ctl_name and strategy in the generic code · 2315ffa0

由 Eric W. Biederman 提交于 4月 03, 2009

The ctl_name and strategy fields are unused, now that sys_sysctl
is a compatibility wrapper around /proc/sys.  No longer looking
at them in the generic code is effectively what we are doing
now and provides the guarantee that during further cleanups
we can just remove references to those fields and everything
will work ok.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

2315ffa0

sysctl: Remove references to ctl_name and strategy from the generic sysctl table · 6fce56ec

由 Eric W. Biederman 提交于 4月 03, 2009

Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6fce56ec

sysctl: Remove dead code from sysctl_check · 83ac201b

由 Eric W. Biederman 提交于 4月 03, 2009

Now that the sys_sysctl is now a compatibility wrapper around
/proc/sys we can remove much of sysctl_check and reduce it
to a few remaining sanity checks.  This completely decouples
it from the binary sysctl system call.

Little things like ensuring that the sysctl has not already
been registered are all that remain.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

83ac201b

sysctl: Neuter the generic sysctl strategy routines. · a965cf94

由 Eric W. Biederman 提交于 4月 03, 2009

Now that sys_sysctl is a compatibility layer on top of /proc/sys
these routines are never called but are still put in sysctl
tables so I have reduced them to stubs until they can be
removed entirely.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

a965cf94

sysctl: Reduce sys_sysctl to a compatibility wrapper around /proc/sys · 26a7034b

由 Eric W. Biederman 提交于 11月 05, 2009

To simply maintenance and to be able to remove all of the binary
sysctl support from various subsystems I have rewritten the binary
sysctl code as a compatibility wrapper around proc/sys.

The code is built around a hard coded table based on the table
in sysctl_check.c that lists all of our current binary sysctls
and provides enough information to convert from the sysctl
binary input into into ascii and back again.  New in this
patch is the realization that the only dynamic entries
that need to be handled have ifname as the asscii string
and ifindex as their ctl_name.

When a sys_sysctl is called the code now looks in the
translation table converting the binary name to the
path under /proc where the value is to be found.  Opens
that file, and calls into a format conversion wrapper
that calls fop->read and then fop->write as appropriate.

Since in practice the practically no one uses or tests
sys_sysctl rewritting the code to be beautiful is a little
silly.  The redeeming merit of this work is it allows us to
rip out all of the binary sysctl syscall support from
everywhere else in the tree.  Allowing us to remove
a lot of dead (after this patch) and barely maintained code.

In addition it becomes much easier to optimize the sysctl
implementation for being the backing store of /proc/sys,
without having to worry about sys_sysctl.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

26a7034b

06 11月, 2009 5 次提交

sysctl: Make do_sysctl static · 642c6d94

由 Eric W. Biederman 提交于 4月 03, 2009

Now that all of the architectures use compat_sys_sysctl do_sysctl
can become static.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

642c6d94

sysctl: Remove the cond_syscall entry for sys32_sysctl · 942405f3

由 Eric W. Biederman 提交于 4月 03, 2009

Now that all architechtures are use compat_sys_sysctl and sys32_sysctl
does not exist there is not point in retaining a cond_syscall
entry for it.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

942405f3

sysctl: Introduce a generic compat sysctl sysctl · da3f6f9b

由 Eric W. Biederman 提交于 4月 03, 2009

This uses compat_alloc_userspace to remove the various
hacks to allow do_sysctl to write to throuh oldlenp.

The rest of our mature compat syscall helper facitilies
are used as well to ensure we have a nice clean maintainable
compat syscall that can be used on all architectures.

The motiviation for a generic compat sysctl (besides the
obvious hack removal) is to reduce the number of compat
sysctl defintions out there so I can refactor the
binary sysctl implementation.

ppc already used the name compat_sys_sysctl so I remove the
ppcs version here.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

da3f6f9b

sysctl: Refactor the binary sysctl handling to remove duplicate code · 2830b683

由 Eric W. Biederman 提交于 4月 03, 2009

Read in the binary sysctl path once, instead of reread it
from user space each time the code needs to access a path
element.

The deprecated sysctl warning is moved to do_sysctl so
that the compat_sysctl entries syscalls will also warn.

The return of -ENOSYS when !CONFIG_SYSCTL_SYSCALL is moved
to binary_sysctl.  Always leaving a do_sysctl available
that handles !CONFIG_SYSCTL_SYSCALL and printing the
deprecated sysctl warning allows for a single defitition
of the sysctl syscall.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

2830b683

sysctl: Separate the binary sysctl logic into it's own file. · afa588b2

由 Eric W. Biederman 提交于 4月 02, 2009

In preparation for more invasive cleanups separate the core
binary sysctl logic into it's own file.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

afa588b2

03 11月, 2009 4 次提交

Correct nr_processes() when CPUs have been unplugged · 1d510750

由 Ian Campbell 提交于 11月 03, 2009

nr_processes() returns the sum of the per cpu counter process_counts for
all online CPUs. This counter is incremented for the current CPU on
fork() and decremented for the current CPU on exit(). Since a process
does not necessarily fork and exit on the same CPU the process_count for
an individual CPU can be either positive or negative and effectively has
no meaning in isolation.

Therefore calculating the sum of process_counts over only the online
CPUs omits the processes which were started or stopped on any CPU which
has since been unplugged. Only the sum of process_counts across all
possible CPUs has meaning.

The only caller of nr_processes() is proc_root_getattr() which
calculates the number of links to /proc as
        stat->nlink = proc_root.nlink + nr_processes();

You don't have to be all that unlucky for the nr_processes() to return a
negative value leading to a negative number of links (or rather, an
apparently enormous number of links). If this happens then you can get
failures where things like "ls /proc" start to fail because they got an
-EOVERFLOW from some stat() call.

Example with some debugging inserted to show what goes on:
        # ps haux|wc -l
        nr_processes: CPU0:     90
        nr_processes: CPU1:     1030
        nr_processes: CPU2:     -900
        nr_processes: CPU3:     -136
        nr_processes: TOTAL:    84
        proc_root_getattr. nlink 12 + nr_processes() 84 = 96
        84
        # echo 0 >/sys/devices/system/cpu/cpu1/online
        # ps haux|wc -l
        nr_processes: CPU0:     85
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -953
        proc_root_getattr. nlink 12 + nr_processes() -953 = -941
        75
        # stat /proc/
        nr_processes: CPU0:     84
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -954
        proc_root_getattr. nlink 12 + nr_processes() -954 = -942
          File: `/proc/'
          Size: 0               Blocks: 0          IO Block: 1024   directory
        Device: 3h/3d   Inode: 1           Links: 4294966354
        Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
        Access: 2009-11-03 09:06:55.000000000 +0000
        Modify: 2009-11-03 09:06:55.000000000 +0000
        Change: 2009-11-03 09:06:55.000000000 +0000

I'm not 100% convinced that the per_cpu regions remain valid for offline
CPUs, although my testing suggests that they do. If not then I think the
correct solution would be to aggregate the process_count for a given CPU
into a global base value in cpu_down().

This bug appears to pre-date the transition to git and it looks like it
may even have been present in linux-2.6.0-test7-bk3 since it looks like
the code Rusty patched in http://lwn.net/Articles/64773/ was already
wrong.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d510750

PM / Hibernate: Add newline to load_image() fail path · bf9fd67a

由 Jiri Slaby 提交于 10月 28, 2009

Finish a line by \n when load_image fails in the middle of loading.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

bf9fd67a

PM / Hibernate: Fix error handling in save_image() · 4ff277f9

由 Jiri Slaby 提交于 10月 28, 2009

There are too many retval variables in save_image(). Thus error return
value from snapshot_read_next() may be ignored and only part of the
snapshot (successfully) written.

Remove 'error' variable, invert the condition in the do-while loop
and convert the loop to use only 'ret' variable.

Switch the rest of the function to consider only 'ret'.

Also make sure we end printed line by \n if an error occurs.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

4ff277f9

PM / Hibernate: Fix blkdev refleaks · 76b57e61

由 Jiri Slaby 提交于 10月 07, 2009

While cruising through the swsusp code I found few blkdev reference
leaks of resume_bdev.

swsusp_read: remove blkdev_put altogether. Some fail paths do
             not do that.
swsusp_check: make sure we always put a reference on fail paths
software_resume: all fail paths between swsusp_check and swsusp_read
                 omit swsusp_close. Add it in those cases. And since
                 swsusp_read doesn't drop the reference anymore, do
                 it here unconditionally.

[rjw: Fixed a small coding style issue.]
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

76b57e61

29 10月, 2009 7 次提交

sysctl: fix false positives when PROC_SYSCTL=n · 8c85dd87

由 Alexey Dobriyan 提交于 10月 26, 2009

Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
people use such combination to reduce ifdefs with non-standard handlers.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Reported-by: Peter Teoh <htmldeveloper@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c85dd87

cgroup: fix strstrip() misuse · 478988d3

由 KOSAKI Motohiro 提交于 10月 26, 2009

cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip().  it makes small inconsistent behavior.

example:
=========================
 # cd /mnt/cgroup/hoge
 # cat memory.swappiness
 60
 # echo "59 " > memory.swappiness
 # cat memory.swappiness
 59
 # echo " 58" > memory.swappiness
 bash: echo: write error: Invalid argument

This patch fixes it.

Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

478988d3

connector: fix regression introduced by sid connector · 0d0df599

由 Christian Borntraeger 提交于 10月 26, 2009

Since commit 02b51df1 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
 [<000000000048a946>] sk_filter+0x9a/0xd0
 [<000000000049d938>] netlink_broadcast+0x2c0/0x53c
 [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
 [<00000000003baef0>] proc_sid_connector+0xc4/0xd4
 [<0000000000142604>] __set_special_pids+0x58/0x90
 [<0000000000159938>] sys_setsid+0xb4/0xd8
 [<00000000001187fe>] sysc_noemu+0x10/0x16
 [<00000041616cb266>] 0x41616cb266

The warning is
--->    WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Cc: Scott James Remnant <scott@ubuntu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0d0df599

param: fix setting arrays of bool · 3c7d76e3

由 Rusty Russell 提交于 10月 29, 2009

We create a dummy struct kernel_param on the stack for parsing each
array element, but we didn't initialize the flags word.  This matters
for arrays of type "bool", where the flag indicates if it really is
an array of bools or unsigned int (old-style).
Reported-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org

3c7d76e3

param: fix NULL comparison on oom · d553ad86

由 Rusty Russell 提交于 10月 29, 2009

kp->arg is always true: it's the contents of that pointer we care about.
Reported-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org

d553ad86

param: fix lots of bugs with writing charp params from sysfs, by leaking mem. · 65afac7d

由 Rusty Russell 提交于 10月 29, 2009

e180a6b7 "param: fix charp parameters set via sysfs" fixed the case
where charp parameters written via sysfs were freed, leaving drivers
accessing random memory.

Unfortunately, storing a flag in the kparam struct was a bad idea: it's
rodata so setting it causes an oops on some archs.  But that's not all:

1) module_param_array() on charp doesn't work reliably, since we use an
   uninitialized temporary struct kernel_param.
2) there's a fundamental race if a module uses this parameter and then
   it's changed: they will still access the old, freed, memory.

The simplest fix (ie. for 2.6.32) is to never free the memory.  This
prevents all these problems, at cost of a memory leak.  In practice, there
are only 18 places where a charp is writable via sysfs, and all are
root-only writable.
Reported-by: NTakashi Iwai <tiwai@suse.de>
Cc: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org

65afac7d

futex: Fix spurious wakeup for requeue_pi really · 11df6ddd

由 Thomas Gleixner 提交于 10月 28, 2009

The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f2 (futex: Handle spurious wake up)

See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>

The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.

Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.

Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
LKML-Reference: <4AE23C74.1090502@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

11df6ddd

28 10月, 2009 1 次提交

sched: move rq_weight data array out of .percpu · 4a6cc4bd

由 Jiri Kosina 提交于 10月 29, 2009

Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.

The per-cpu handling doesn't impose significant performance penalty on
potentially contented path in tg_shares_up().

...
ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
ffffffff81043383:       00 00
ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff81043393:       00
ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff8104339b:       00
ffffffff8104339c:       48 01 d0                add    %rdx,%rax
ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
ffffffff810433a6:       00
ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
...

After:

...
ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
ffffffff81043383:       00
ffffffff81043384:       48 98                   cltq
ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
ffffffff8104338d:       00
ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
ffffffff8104339c:       81
ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff810433a4:       00
ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff810433b5:       00
ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
...
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NTejun Heo <tj@kernel.org>

4a6cc4bd

24 10月, 2009 4 次提交

tracing: Remove cpu arg from the rb_time_stamp() function · 6d3f1e12

由 Jiri Olsa 提交于 10月 23, 2009

The cpu argument is not used inside the rb_time_stamp() function.
Plus fix a typo.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233647.118547500@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d3f1e12

tracing: Fix comment typo and documentation example · 67b394f7

由 Jiri Olsa 提交于 10月 23, 2009

Trivial patch to fix a documentation example and to fix a
comment.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.871719877@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

67b394f7

tracing: Fix trace_seq_printf() return value · 3e69533b

由 Jiri Olsa 提交于 10月 23, 2009

trace_seq_printf() return value is a little ambiguous. It
currently returns the length of the space available in the
buffer. printf usually returns the amount written. This is not
adequate here, because:

  trace_seq_printf(s, "");

is perfectly legal, and returning 0 would indicate that it
failed.

We can always see the amount written by looking at the before
and after values of s->len. This is not quite the same use as
printf. We only care if the string was successfully written to
the buffer or not.

Make trace_seq_printf() return 0 if the trace oversizes the
buffer's free space, 1 otherwise.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.631787612@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3e69533b

tracing: Update *ppos instead of filp->f_pos · cf8517cf

由 Jiri Olsa 提交于 10月 23, 2009

Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.399670810@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cf8517cf

23 10月, 2009 2 次提交

perf events: Don't generate events for the idle task when exclude_idle is set · 54f44076

由 Soeren Sandmann 提交于 10月 22, 2009

Getting samples for the idle task is often not interesting, so
don't generate them when exclude_idle is set for the event in
question.
Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <ye8pr8fmlq7.fsf@camel16.daimi.au.dk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

54f44076

perf events: Fix swevent hrtimer sampling by keeping track of remaining time... · 721a669b

由 Soeren Sandmann 提交于 9月 15, 2009

perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers

Make the hrtimer based events work for sysprof.

Whenever a swevent is scheduled out, the hrtimer is canceled.
When it is scheduled back in, the timer is restarted. This
happens every scheduler tick, which means the timer never
expired because it was getting repeatedly restarted over and
over with the same period.

To fix that, save the remaining time when disabling; when
reenabling, use that saved time as the period instead of the
user-specified sampling period.

Also, move the starting and stopping of the hrtimers to helper
functions instead of duplicating the code.
Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

721a669b

22 10月, 2009 1 次提交

PM: Make warning in suspend_test_finish() less likely to happen · 04bf7539

由 Rafael J. Wysocki 提交于 10月 20, 2009

Increase TEST_SUSPEND_SECONDS to 10 so the warning in
suspend_test_finish() doesn't annoy the users of slower systems so much.

Also, make the warning print the suspend-resume cycle time, so that we
know why the warning actually triggered.

Patch prepared during the hacking session at the Kernel Summit in Tokyo.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04bf7539

19 10月, 2009 1 次提交

HWPOISON: Allow schedule_on_each_cpu() from keventd · 65a64464

由 Andi Kleen 提交于 10月 14, 2009

Right now when calling schedule_on_each_cpu() from keventd there
is a deadlock because it tries to schedule a work item on the current CPU
too. This happens via lru_add_drain_all() in hwpoison.

Just call the function for the current CPU in this case. This is actually
faster too.

Debugging with Fengguang Wu & Max Asbock
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

65a64464

16 10月, 2009 2 次提交

futex: Move drop_futex_key_refs out of spinlock'ed region · 89061d3d

由 Darren Hart 提交于 10月 15, 2009

When requeuing tasks from one futex to another, the reference held
by the requeued task to the original futex location needs to be
dropped eventually.

Dropping the reference may ultimately lead to a call to
"iput_final" and subsequently call into filesystem- specific code -
which may be non-atomic.

It is therefore safer to defer this drop operation until after the
futex_hash_bucket spinlock has been dropped.

Originally-From: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <4AD7A298.5040802@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

89061d3d

rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang · 237c80c5

由 Paul E. McKenney 提交于 10月 15, 2009

If the following sequence of events occurs, then
TREE_PREEMPT_RCU will hang waiting for a grace period to
complete, eventually OOMing the system:

o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
	with more than 64 physical CPUs present (32 on a 32-bit system).
	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
	with RCU_FANOUT set to a sufficiently small value that the
	physical CPUs populate two or more leaf rcu_node structures.

o	A task is preempted in an RCU read-side critical section
	while running on a CPU corresponding to a given leaf rcu_node
	structure.

o	All CPUs corresponding to this same leaf rcu_node structure
	record quiescent states for the current grace period.

o	All of these same CPUs go offline (hence the need for enough
	physical CPUs to populate more than one leaf rcu_node structure).
	This causes the preempted task to be moved to the root rcu_node
	structure.

At this point, there is nothing left to cause the quiescent
state to be propagated up the rcu_node tree, so the current
grace period never completes.

The simplest fix, especially after considering the deadlock
possibilities, is to detect this situation when the last CPU is
offlined, and to set that CPU's ->qsmask bit in its leaf
rcu_node structure.  This will cause the next invocation of
force_quiescent_state() to end the grace period.

Without this fix, this hang can be triggered in an hour or so on
some machines with rcutorture and random CPU onlining/offlining.
With this fix, these same machines pass a full 10 hours of this
sort of abuse.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

237c80c5

15 10月, 2009 4 次提交

rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU · 019129d5

由 Paul E. McKenney 提交于 10月 14, 2009

For the short term, map synchronize_rcu_expedited() to
synchronize_rcu() for TREE_PREEMPT_RCU and to
synchronize_sched_expedited() for TREE_RCU.

Longer term, there needs to be a real expedited grace period for
TREE_PREEMPT_RCU, but candidate patches to date are considerably
more complex and intrusive.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592331-git-send-email->
Signed-off-by: NIngo Molnar <mingo@elte.hu>

019129d5

rcu: Prevent RCU IPI storms in presence of high call_rcu() load · 37c72e56

由 Paul E. McKenney 提交于 10月 14, 2009

As the number of callbacks on a given CPU rises, invoke
force_quiescent_state() only every blimit number of callbacks
(defaults to 10,000), and even then only if no other CPU has
invoked force_quiescent_state() in the meantime.

This should fix the performance regression reported by Nick.
Reported-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592133-git-send-email->
Signed-off-by: NIngo Molnar <mingo@elte.hu>

37c72e56

workqueue: add 'flush_delayed_work()' to run and wait for delayed work · 8c53e463

由 Linus Torvalds 提交于 10月 14, 2009

It basically turns a delayed work into an immediate work, and then waits
for it to finish, thus allowing you to force (and wait for) an immediate
flush of a delayed work.

We'll want to use this in the tty layer to clean up tty_flush_to_ldisc().
Acked-by: NOleg Nesterov <oleg@redhat.com>
[ Fixed to use 'del_timer_sync()' as noted by Oleg ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c53e463

futex: Check for NULL keys in match_futex · 2bc87203

由 Darren Hart 提交于 10月 14, 2009

If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
it will find the futex_q->requeue_pi_key to be NULL and OOPS.

Check for NULL in match_futex() instead of doing explicit NULL pointer
checks on all call sites.  While match_futex(NULL, NULL) returning
false is a little odd, it's still correct as we expect valid key
references.
Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dinakar Guniguntala <dino@in.ibm.com>
CC: John Stultz <johnstul@us.ibm.com>
Cc: stable@kernel.org
LKML-Reference: <4AD60687.10306@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

2bc87203