提交 · 2c42818e962e2858334bf45bfc56662b3752df34 · openanolis / cloud-kernel

29 9月, 2011 5 次提交

rcu: Abstract common code for RCU grace-period-wait primitives · 2c42818e

由 Paul E. McKenney 提交于 5月 26, 2011

Pull the code that waits for an RCU grace period into a single function,
which is then called by synchronize_rcu() and friends in the case of
TREE_RCU and TREE_PREEMPT_RCU, and from rcu_barrier() and friends in
the case of TINY_RCU and TINY_PREEMPT_RCU.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2c42818e

rcu: Fix mismatched variable in rcutree_trace.c · f039d1f1

由 Andi Kleen 提交于 6月 07, 2011

rcutree.c defines rcu_cpu_kthread_cpu as int, not unsigned int,
so the extern has to follow that.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f039d1f1

rcu: Restore checks for blocking in RCU read-side critical sections · b3fbab05

由 Paul E. McKenney 提交于 5月 24, 2011

Long ago, using TREE_RCU with PREEMPT would result in "scheduling
while atomic" diagnostics if you blocked in an RCU read-side critical
section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
this diagnostic. This commit therefore adds a replacement diagnostic
based on PROVE_RCU.

Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
used for things that have nothing to do with rcu_dereference(), rename
lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
argument that is a string indicating what is suspicious. This third
argument is passed in from a new third argument to rcu_lockdep_assert().
Update all calls to rcu_lockdep_assert() to add an informative third
argument.

Also, add a pair of rcu_lockdep_assert() calls from within
rcu_note_context_switch(), one complaining if a context switch occurs
in an RCU-bh read-side critical section and another complaining if a
context switch occurs in an RCU-sched read-side critical section.
These are present only if the PROVE_RCU kernel parameter is enabled.

Finally, fix some checkpatch whitespace complaints in lockdep.c.

Again, you must enable PROVE_RCU to see these new diagnostics. But you
are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b3fbab05

rcu: Avoid unnecessary self-wakeup of per-CPU kthreads · 1eb52121

由 Shaohua Li 提交于 6月 16, 2011

There are a number of cases where the RCU can find additional work
for the per-CPU kthread within the context of that per-CPU kthread.
In such cases, the per-CPU kthread is already running, so attempting
to wake itself up does nothing except waste CPU cycles.  This commit
therefore checks to see if it is in the per-CPU kthread context,
omitting the wakeup in this case.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1eb52121

rcu: Use kthread_create_on_node() · 1f288094

由 Eric Dumazet 提交于 6月 16, 2011

Commit a26ac245 (move TREE_RCU from softirq to kthread) added
per-CPU kthreads.  However, kthread creation uses kthread_create(), which
can put the kthread's stack and task struct on the wrong NUMA node.
Therefore, use kthread_create_on_node() instead of kthread_create()
so that the stacks and task structs are placed on the correct NUMA node.

A similar change was carried out in commit 94dcf29a (kthread:
use kthread_create_on_node()).

Also change rcutorture's priority-boost-test kthread creation.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Tejun Heo <tj@kernel.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1f288094

26 9月, 2011 1 次提交

ptrace: PTRACE_LISTEN forgets to unlock ->siglock · f9d81f61

由 Oleg Nesterov 提交于 9月 25, 2011

If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop ->siglock.
Reported-by: NMatt Fleming <matt.fleming@intel.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9d81f61

20 9月, 2011 2 次提交

Make taskstats round statistics down to nearest 1k bytes/events · 58c3c3aa

由 Linus Torvalds 提交于 9月 19, 2011

Even with just the interface limited to admin, there really is little to
reason to give byte-per-byte counts for taskstats.  So round it down to
something less intrusive.
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58c3c3aa

Make TASKSTATS require root access · 1a51410a

由 Linus Torvalds 提交于 9月 19, 2011

Ok, this isn't optimal, since it means that 'iotop' needs admin
capabilities, and we may have to work on this some more.  But at the
same time it is very much not acceptable to let anybody just read
anybody elses IO statistics quite at this level.

Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative
to checking the capabilities by hand.
Reported-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a51410a

15 9月, 2011 1 次提交

workqueue: lock cwq access in drain_workqueue · fa2563e4

由 Thomas Tuttle 提交于 9月 14, 2011

Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to
make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing
and then incrementing nr_active when it activates a delayed work.

We discovered this when a corner case in one of our drivers resulted in
us trying to destroy a workqueue in which the remaining work would
always requeue itself again in the same workqueue.  We would hit this
race condition and trip the BUG_ON on workqueue.c:3080.
Signed-off-by: NThomas Tuttle <ttuttle@chromium.org>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa2563e4

12 9月, 2011 1 次提交

genirq: Make irq_shutdown() symmetric vs. irq_startup again · ed585a65

由 Geert Uytterhoeven 提交于 9月 11, 2011

If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or
.irq_mask(), free_irq() crashes when jumping to NULL.
Fix this by only trying .irq_disable() and .irq_mask() if there's no
.irq_shutdown() provided.

This revives the symmetry with irq_startup(), which tries .irq_startup(),
.irq_enable(), and irq_unmask(), and makes it consistent with the comment for
irq_chip.irq_shutdown() in <linux/irq.h>, which says:

 * @irq_shutdown:	shut down the interrupt (defaults to ->disable if NULL)

This is also how __free_irq() behaved before the big overhaul, cfr. e.g.
3b56f058 ("genirq: Remove bogus conditional"),
where the core interrupt code always overrode .irq_shutdown() to
.irq_disable() if .irq_shutdown() was NULL.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org
Cc: stable@kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ed585a65

31 8月, 2011 1 次提交

perf_event: Fix broken calc_timer_values() · 7f310a5d

由 Eric B Munson 提交于 6月 23, 2011

We detected a serious issue with PERF_SAMPLE_READ and
timing information when events were being multiplexing.

Samples would have time_running > time_enabled. That
was easy to reproduce with a libpfm4 example (ran 3
times to cause multiplexing on Core 2):

 $ syst_smpl -e uops_retired:freq=1 &
 $ syst_smpl -e uops_retired:freq=1 &
 $ syst_smpl -e uops_retired:freq=1 &
 IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
 syst_smpl: WARNING: time_running > time_enabled
	63277537998 uops_retired:freq=1 , scaled

The bug was not present in kernel up to (and including) 3.0. It turns
out the bug was introduced by the following commit:

commit c4794295

    events: Move lockless timer calculation into helper function

The parameters of the function got reversed yet the call sites
were not updated to reflect the change. That lead to time_running
and time_enabled being swapped. That had no effect when there was
no multiplexing because in that case time_running = time_enabled
but it would show up in any other scenario.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110829124112.GA4828@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

7f310a5d

29 8月, 2011 4 次提交

perf events: Fix slow and broken cgroup context switch code · a8d757ef

由 Stephane Eranian 提交于 8月 25, 2011

The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 <not counted> cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 <not counted> cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

a8d757ef

sched: Fix a memory leak in __sdt_free() · feff8fa0

由 WANG Cong 提交于 8月 18, 2011

This patch fixes the following memory leak:

unreferenced object 0xffff880107266800 (size 512):
  comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81133940>] create_object+0x187/0x28b
    [<ffffffff814ac103>] kmemleak_alloc+0x73/0x98
    [<ffffffff811232ba>] __kmalloc_node+0x104/0x159
    [<ffffffff81044b98>] kzalloc_node.clone.97+0x15/0x17
    [<ffffffff8104cb90>] build_sched_domains+0xb7/0x7f3
    [<ffffffff8104d4df>] partition_sched_domains+0x1db/0x24a
    [<ffffffff8109ee4a>] do_rebuild_sched_domains+0x3b/0x47
    [<ffffffff810a00c7>] rebuild_sched_domains+0x10/0x12
    [<ffffffff8104d5ba>] sched_power_savings_store+0x6c/0x7b
    [<ffffffff8104d5df>] sched_mc_power_savings_store+0x16/0x18
    [<ffffffff8131322c>] sysdev_class_store+0x20/0x22
    [<ffffffff81193876>] sysfs_write_file+0x108/0x144
    [<ffffffff81135b10>] vfs_write+0xaf/0x102
    [<ffffffff81135d23>] sys_write+0x4d/0x74
    [<ffffffff814c8a42>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org # 3.0
Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

feff8fa0

sched: Move blk_schedule_flush_plug() out of __schedule() · 9c40cef2

由 Thomas Gleixner 提交于 6月 22, 2011

There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.

Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.

This fixes a deadlock in the scheduler where the
blk_schedule_flush_plug() callchain enables interrupts and thereby
allows a wakeup to happen of the task that's going to sleep.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org # 2.6.39+
Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

9c40cef2

sched: Separate the scheduler entry for preemption · c259e01a

由 Thomas Gleixner 提交于 6月 22, 2011

Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.

To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org # 2.6.39+
Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.deSigned-off-by: NIngo Molnar <mingo@elte.hu>

c259e01a

27 8月, 2011 1 次提交

All Arch: remove linkage for sys_nfsservctl system call · f5b94099

由 NeilBrown 提交于 8月 26, 2011

The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5b94099

26 8月, 2011 2 次提交

kernel/printk: do not turn off bootconsole in printk_late_init() if keep_bootcon · 4c30c6f5

由 Nishanth Aravamudan 提交于 8月 25, 2011

It seems that 7bf69395 ("console: allow to retain boot console via
boot option keep_bootcon") doesn't always achieve what it aims, as when
printk_late_init() runs it unconditionally turns off all boot consoles.
With this patch, I am able to see more messages on the boot console in
KVM guests than I can without, when keep_bootcon is specified.

I think it is appropriate for the relevant -stable trees.  However, it's
more of an annoyance than a serious bug (ideally you don't need to keep
the boot console around as console handover should be working -- I was
encountering a situation where the console handover wasn't working and
not having the boot console available meant I couldn't see why).
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Acked-by: NFabio M. Di Nitto <fdinitto@redhat.com>
Cc: <stable@kernel.org>		[2.6.39.x, 3.0.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c30c6f5

Add a personality to report 2.6.x version numbers · be27425d

由 Andi Kleen 提交于 8月 19, 2011

I ran into a couple of programs which broke with the new Linux 3.0
version. Some of those were binary only. I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.

For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.

$ uname -a
Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

$ hpacucli ctrl all show

Error: No controllers detected.

$ rpm -qf /usr/sbin/hpacucli
hpacucli-8.75-12.0

Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":

https://bugzilla.mozilla.org/show_bug.cgi?id=664564

It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.

This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead. The x is the x in 3.x.

I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.

Some programs also read /proc/sys/kernel/osrelease. This can be worked
around in user space with mount --bind (and a mount namespace)

To use:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
gcc -o uname26 uname26.c
./uname26 program
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be27425d

24 8月, 2011 1 次提交

Revert "irq: Always set IRQF_ONESHOT if no primary handler is specified" · 69dd3d8e

由 Linus Torvalds 提交于 8月 23, 2011

This reverts commit f3637a5f.

It turns out that this breaks several drivers, one example being OMAP
boards which use the on-board OMAP UARTs and the omap-serial driver that
will not boot to userspace after the commit.

Paul Walmsley reports that enabling CONFIG_DEBUG_SHIRQ reveals 'IRQ
handler type mismatch' errors:

  IRQ handler type mismatch for IRQ 74
  current handler: serial idle
  ...

and the reason is that setting IRQF_ONESHOT will now result in those
interrupt handlers having different IRQF flags, and thus being
unsharable.  So the commit log in the reverted commit:

                            "Since it is required for those users and
    there is no difference for others it makes sense to add this flag
    unconditionally."

is simply not true: there may not be any difference from a "actions at
irq time", but there is a *big* difference wrt this flag testing irq
management (see __setup_irq() in kernel/irq/manage.c).

One solution may be to stop verifying IRQF_ONESHOT in __setup_irq(), but
right now the safe course of action is to revert the change.  Let's
revisit this in a later merge window.
Reported-by: NPaul Walmsley <paul@pwsan.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Requested-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69dd3d8e

19 8月, 2011 1 次提交

irqdesc: fix new kernel-doc warning · d522a0d1

由 Randy Dunlap 提交于 8月 18, 2011

Fix kernel-doc warning in irqdesc.c:

  Warning(kernel/irq/irqdesc.c:353): No description found for parameter 'owner'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d522a0d1

14 8月, 2011 1 次提交

PM / Domains: Fix build for CONFIG_PM_RUNTIME unset · 17f2ae7f

由 Rafael J. Wysocki 提交于 8月 14, 2011

Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case.  Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

17f2ae7f

13 8月, 2011 1 次提交

xfs: remove subdirectories · c59d87c4

由 Christoph Hellwig 提交于 8月 12, 2011

Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
annoying subdirectories in the XFS source code.  Besides the large
amount of file rename the only changes are to the Makefile, a few
files including headers with the subdirectory prefix, and the binary
sysctl compat code that includes a header under fs/xfs/ from
kernel/.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c59d87c4

12 8月, 2011 1 次提交

move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997

由 Vasiliy Kulikov 提交于 8月 08, 2011

The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only.  But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges.  So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes.  Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve().  If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected.  If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Acked-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72fa5997

11 8月, 2011 2 次提交

blktrace: add FLUSH/FUA support · c09c47ca

由 Namhyung Kim 提交于 8月 11, 2011

Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

 - WRITE:            W
 - WRITE_FLUSH:      FW
 - WRITE_FUA:        WF
 - WRITE_FLUSH_FUA:  FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c09c47ca

alarmtimers: Avoid possible denial of service with high freq periodic timers · 6af7e471

由 John Stultz 提交于 8月 10, 2011

Its possible to jam up the alarm timers by setting very small interval
timers, which will cause the alarmtimer subsystem to spend all of its time
firing and restarting timers. This can effectivly lock up a box.

A deeper fix is needed, closely mimicking the hrtimer code, but for now
just cap the interval to 100us to avoid userland hanging the system.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

6af7e471

10 8月, 2011 3 次提交

alarmtimers: Memset itimerspec passed into alarm_timer_get · ea7802f6

由 John Stultz 提交于 8月 04, 2011

Following common_timer_get, zero out the itimerspec passed in.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

ea7802f6

alarmtimers: Avoid possible null pointer traversal · 971c90bf

由 John Stultz 提交于 8月 04, 2011

We don't check if old_setting is non null before assigning it, so
correct this.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

971c90bf

cap_syslog: don't use WARN_ONCE for CAP_SYS_ADMIN deprecation warning · f2c0d026

由 Jonathan Nieder 提交于 8月 08, 2011

syslog-ng versions before 3.3.0beta1 (2011-05-12) assume that
CAP_SYS_ADMIN is sufficient to access syslog, so ever since CAP_SYSLOG
was introduced (2010-11-25) they have triggered a warning.

Commit ee24aebf ("cap_syslog: accept CAP_SYS_ADMIN for now")
improved matters a little by making syslog-ng work again, just keeping
the WARN_ONCE().  But still, this is a warning that writes a stack trace
we don't care about to syslog, sets a taint flag, and alarms sysadmins
when nothing worse has happened than use of an old userspace with a
recent kernel.

Convert the WARN_ONCE to a printk_once to avoid that while continuing to
give userspace developers a hint that this is an unwanted
backward-compatibility feature and won't be around forever.
Reported-by: NRalf Hildebrandt <ralf.hildebrandt@charite.de>
Reported-by: NNiels <zorglub_olsen@hotmail.com>
Reported-by: NPaweł Sikora <pluto@agmk.net>
Signed-off-by: NJonathan Nieder <jrnieder@gmail.com>
Liked-by: NGergely Nagy <algernon@madhouse-project.org>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Acked-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2c0d026

09 8月, 2011 1 次提交

lockdep: Fix wrong assumption in match_held_lock · 80e0401e

由 Peter Zijlstra 提交于 8月 05, 2011

match_held_lock() was assuming it was being called on a lock class
that had already seen usage.

This condition was true for bug-free code using lockdep_assert_held(),
since you're in fact holding the lock when calling it. However the
assumption fails the moment you assume the assertion can fail, which
is the whole point of having the assertion in the first place.

Anyway, now that there's more lockdep_is_held() users, notably
__rcu_dereference_check(), its much easier to trigger this since we
test for a number of locks and we only need to hold any one of them to
be good.
Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1312547787.28695.2.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

80e0401e

06 8月, 2011 1 次提交

jump label: Reduce the cycle count by changing the link order · b77f0f3c

由 Jason Baron 提交于 8月 05, 2011

In the course of testing jump labels for use with the CFS
bandwidth controller, Paul Turner, discovered that using jump
labels reduced the branch count and the instruction count, but
did not reduce the cycle count or wall time.

I noticed that having the jump_label.o included in the kernel
but not used in any way still caused this increase in cycle
count and wall time. Thus, I moved jump_label.o in the
kernel/Makefile, thus changing the link order, and presumably
moving it out of hot icache areas. This brought down the cycle
count/time as expected.

In addition to Paul's testing,  I've tested the patch using a
single 'static_branch()' in the getppid() path, and basically
running tight loops of calls to getppid(). Here are my results
for the branch disabled case:

With jump labels turned on (CONFIG_JUMP_LABEL), branch disabled:

 Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):

     3,969,510,217 instructions             #	   0.864 IPC     ( +-0.000% )
     4,592,334,954 cycles                     ( +-   0.046% )
       751,634,470 branches                   ( +-   0.000% )

        1.722635797  seconds time elapsed   ( +-   0.046% )

Jump labels turned off (CONFIG_JUMP_LABEL not set), branch
disabled:

 Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):

     4,009,611,846 instructions             #	   0.867 IPC     ( +-0.000% )
     4,622,210,580 cycles                     ( +-   0.012% )
       771,662,904 branches                   ( +-   0.000% )

        1.734341454  seconds time elapsed   ( +-   0.022% )
Signed-off-by: NJason Baron <jbaron@redhat.com>
Cc: rth@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20110805204040.GG2522@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
Tested-by: NPaul Turner <pjt@google.com>

b77f0f3c

04 8月, 2011 6 次提交

lockdep: Clear whole lockdep_map on initialization · f59de899

由 Tejun Heo 提交于 7月 14, 2011

lockdep_init_map() only initializes parts of lockdep_map and triggers
kmemcheck warning when it is copied as a whole. There isn't anything
to be gained by clearing selectively. memset() the whole structure
and remove loop for ->class_cache[] clearing.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=35532Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-and-tested-by: NChristian Casteyde <casteyde.christian@free.fr>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35532Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110714131909.GJ3455@htj.dyndns.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

f59de899

lockdep: Fix up warning · 70a0686a

由 Peter Zijlstra 提交于 7月 25, 2011

On Sun, 2011-07-24 at 21:06 -0400, Arnaud Lacombe wrote:

> /src/linux/linux/kernel/lockdep.c: In function 'mark_held_locks':
> /src/linux/linux/kernel/lockdep.c:2471:31: warning: comparison of
> distinct pointer types lacks a cast

The warning is harmless in this case, but the below makes it go away.
Reported-by: NArnaud Lacombe <lacombar@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311588599.2617.56.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>

70a0686a

lockdep: Fix trace_hardirqs_on_caller() · 7d36b26b

由 Peter Zijlstra 提交于 7月 26, 2011

Commit dd4e5d3a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]()
recursion") made a bit of a mess of the various checks and error
conditions.

In particular it moved the check for !irqs_disabled() before the
spurious enable test, resulting in some warnings.
Reported-by: NArnaud Lacombe <lacombar@gmail.com>
Reported-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311679697.24752.28.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

7d36b26b

Boot up with usermodehelper disabled · 288d5abe

由 Linus Torvalds 提交于 8月 03, 2011

The core device layer sends tons of uevent notifications for each device
it finds, and if the kernel has been built with a non-empty
CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
helper binary for all these events very early in the boot.

Not only won't the root filesystem even be mounted at that point, we
literally won't have necessarily even initialized all the process
handling data structures at that point, which causes no end of silly
problems even when the usermode helper doesn't actually succeed in
executing.

So just use our existing infrastructure to disable the usermodehelpers
to make the kernel start out with them disabled.  We enable them when
we've at least initialized stuff a bit.

Problems related to an uninitialized

	init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex

reported by various people.
Reported-by: NManuel Lauss <manuel.lauss@googlemail.com>
Reported-by: NRichard Weinberger <richard@nod.at>
Reported-by: NMarc Zyngier <maz@misterjones.org>
Acked-by: NKay Sievers <kay.sievers@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

288d5abe

taskstats: add_del_listener() should ignore !valid listeners · a7295898

由 Oleg Nesterov 提交于 8月 03, 2011

When send_cpu_listeners() finds the orphaned listener it marks it as
!valid and drops listeners->sem.  Before it takes this sem for writing,
s->pid can be reused and add_del_listener() can wrongly try to re-use
this entry.

Change add_del_listener() to check ->valid = T.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7295898

taskstats: add_del_listener() shouldn't use the wrong node · dfc428b6

由 Oleg Nesterov 提交于 8月 03, 2011

1. Commit 26c4caea "don't allow duplicate entries in listener mode"
   changed add_del_listener(REGISTER) so that "next_cpu:" can reuse the
   listener allocated for the previous cpu, this doesn't look exactly
   right even if minor.

   Change the code to kfree() in the already-registered case, this case
   is unlikely anyway so the extra kmalloc_node() shouldn't hurt but
   looke more correct and clean.

2. use the plain list_for_each_entry() instead of _safe() to scan
   listeners->list.

3. Remove the unneeded INIT_LIST_HEAD(&s->list), we are going to
   list_add(&s->list).
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfc428b6

02 8月, 2011 4 次提交

kdb,kgdb: Allow arbitrary kgdb magic knock sequences · 37f86b46

由 Jason Wessel 提交于 5月 24, 2011

The first packet that gdb sends when the kernel is in kdb mode seems
to change with every release of gdb.  Instead of continuing to add
many different gdb packets, change kdb to automatically look for any
thing that looks like a gdb packet.

Example 1 cold start test:
echo g > /proc/sysrq-trigger
$D#44+

Example 2 cold start test:
echo g > /proc/sysrq-trigger
$3#33

The second one should re-enter kdb's shell right away and is purely a
test.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

37f86b46

kdb: Remove all references to DOING_KGDB2 · d613d828

由 Jason Wessel 提交于 5月 23, 2011

The DOING_KGDB2 was originally a state variable for one of the two
ways to automatically transition from kdb to kgdb. Purge all these
variables and just use one single state for the transition.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

d613d828

kdb,kgdb: Implement switch and pass buffer from kdb -> gdb · f679c498

由 Jason Wessel 提交于 5月 23, 2011

When switching from kdb mode to kgdb mode packets were getting lost
depending on the size of the fifo queue of the serial chip.  When gdb
initially connects if it is in kdb mode it should entirely send any
character buffer over to the gdbstub when switching connections.

Previously kdb was zero'ing out the character buffer and this could
lead to gdb failing to connect at all, or a lengthy pause could occur
on the initial connect.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

f679c498

kdb: cleanup unused variables missed in the original kdb merge · 3bdb65ec

由 Jason Wessel 提交于 6月 30, 2011

The BTARGS and BTSYMARG variables do not have any function in the
mainline version of kdb.
Reported-by: NTim Bird <tim.bird@am.sony.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

3bdb65ec

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功