提交 · e33078baa4d30ad1d0e46d1f62b9e5a63a3e6ee3 · openeuler / raspberrypi-kernel

18 11月, 2010 6 次提交

sched: Fix update_cfs_load() synchronization · e33078ba

由 Paul Turner 提交于 11月 15, 2010

Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with
the put path since nr_running accounting occurs at deactivation.

It's also not safe to make the removal decision based on load_avg as this fails
with both high periods and low shares.  Resolve this by clipping history after
4 periods without activity.

Note: the above will always occur from update_shares() since in the
last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load
is called.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234937.933428187@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e33078ba

sched: Fix load corruption from update_cfs_shares() · f0d7442a

由 Paul Turner 提交于 11月 15, 2010

As part of enqueue_entity both a new entity weight and its contribution to the
queuing cfs_rq / rq are updated.  Since update_cfs_shares will only update the
queueing weights when the entity is on_rq (which in this case it is not yet),
there's a dependency loop here:

update_cfs_shares needs account_entity_enqueue to update cfs_rq->load.weight
account_entity_enqueue needs the updated weight for the queuing cfs_rq load[*]

Fix this and avoid spurious dequeue/enqueues by issuing update_cfs_shares as
if we had accounted the enqueue already.

This was also resulting in rq->load corruption previously.

[*]: this dependency also exists when using the group cfs_rq w/
     update_cfs_shares as the weight of the enqueued entity changes
     without the load being updated.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234937.844900206@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f0d7442a

sched: Make tg_shares_up() walk on-demand · 9e3081ca

由 Peter Zijlstra 提交于 11月 15, 2010

Make tg_shares_up() use the active cgroup list, this means we cannot
do a strict bottom-up walk of the hierarchy, but assuming its a very
wide tree with a small number of active groups it should be a win.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234937.754159484@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9e3081ca

sched: Implement on-demand (active) cfs_rq list · 3d4b47b4

由 Peter Zijlstra 提交于 11月 15, 2010

Make certain load-balance actions scale per number of active cgroups
instead of the number of existing cgroups.

This makes wakeup/sleep paths more expensive, but is a win for systems
where the vast majority of existing cgroups are idle.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234937.666535048@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3d4b47b4

sched: Rewrite tg_shares_up) · 2069dd75

由 Peter Zijlstra 提交于 11月 15, 2010

By tracking a per-cpu load-avg for each cfs_rq and folding it into a
global task_group load on each tick we can rework tg_shares_up to be
strictly per-cpu.

This should improve cpu-cgroup performance for smp systems
significantly.

[ Paul: changed to use queueing cfs_rq + bug fixes ]
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234937.580480400@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2069dd75

sched: Simplify cpu-hot-unplug task migration · 48c5ccae

由 Peter Zijlstra 提交于 11月 13, 2010

While discussing the need for sched_idle_next(), Oleg remarked that
since try_to_wake_up() ensures sleeping tasks will end up running on a
sane cpu, we can do away with migrate_live_tasks().

If we then extend the existing hack of migrating current from
CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
need for the sched_idle_next() abomination disappears as well, since
idle will be the only possible thread left after the migration thread
stops.

This greatly simplifies the hot-unplug task migration path, as can be
seen from the resulting code reduction (and about half the new lines
are comments).
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1289851597.2109.547.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

48c5ccae

16 11月, 2010 1 次提交

capabilities/syslog: open code cap_syslog logic to fix build failure · 12b3052c

由 Eric Paris 提交于 11月 15, 2010

The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
failure when CONFIG_PRINTK=n.  This is because the capabilities code
which used the new option was built even though the variable in question
didn't exist.

The patch here fixes this by moving the capabilities checks out of the
LSM and into the caller.  All (known) LSMs should have been calling the
capabilities hook already so it actually makes the code organization
better to eliminate the hook altogether.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12b3052c

12 11月, 2010 3 次提交

Restrict unprivileged access to kernel syslog · eaf06b24

由 Dan Rosenberg 提交于 11月 11, 2010

The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses.  Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.

This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl.  When set to "0", the default, no restrictions are
enforced.  When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.

[akpm@linux-foundation.org: explain the config option in kernel.txt]
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NEugene Teo <eugeneteo@kernel.org>
Acked-by: NKees Cook <kees.cook@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eaf06b24

latencytop: fix per task accumulator · 38715258

由 Ken Chen 提交于 11月 11, 2010

Per task latencytop accumulator prematurely terminates due to erroneous
placement of latency_record_count.  It should be incremented whenever a
new record is allocated instead of increment on every latencytop event.

Also fix search iterator to only search known record events instead of
blindly searching all pre-allocated space.
Signed-off-by: NKen Chen <kenchen@google.com>
Reviewed-by: NArjan van de Ven <arjan@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

38715258

kernel/range.c: fix clean_sort_range() for the case of full array · 834b4038

由 Alexey Khoroshilov 提交于 11月 11, 2010

clean_sort_range() should return a number of nonempty elements of range
array, but if the array is full clean_sort_range() returns 0.

The problem is that the number of nonempty elements is evaluated by
finding the first empty element of the array.  If there is no such element
it returns an initial value of local variable nr_range that is zero.

The fix is trivial: it changes initial value of nr_range to size of the
array.

The bug can lead to loss of information regarding all ranges, since
typically returned value of clean_sort_range() is considered as an actual
number of ranges in the array after a series of add/subtract operations.

Found by Analytical Verification project of Linux Verification Center
(linuxtesting.org), thanks to Alexander Kolosov.
Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

834b4038

11 11月, 2010 1 次提交

perf_events: Fix time tracking in samples · eed01528

由 Stephane Eranian 提交于 10月 26, 2010

This patch corrects time tracking in samples. Without this patch
both time_enabled and time_running are bogus when user asks for
PERF_SAMPLE_READ.

One uses PERF_SAMPLE_READ to sample the values of other counters
in each sample. Because of multiplexing, it is necessary to know
both time_enabled, time_running to be able to scale counts correctly.

In this second version of the patch, we maintain a shadow
copy of ctx->time which allows us to compute ctx->time without
calling update_context_time() from NMI context. We avoid the
issue that update_context_time() must always be called with
ctx->lock held.

We do not keep shadow copies of the other event timings
because if the lead event is overflowing then it is active
and thus it's been scheduled in via event_sched_in() in
which case neither tstamp_stopped, tstamp_running can be modified.

This timing logic only applies to samples when PERF_SAMPLE_READ
is used.

Note that this patch does not address timing issues related
to sampling inheritance between tasks. This will be addressed
in a future patch.

With this patch, the libpfm4 example task_smpl now reports
correct counts (shown on 2.4GHz Core 2):

$ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears  noploop 5
noploop for 5 seconds
IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3
	2,400,000,254 unhalted_core_cycles:u (33)
	2,399,273,744 instructions_retired:u (34)
	53,340 baclears (35)
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

eed01528

10 11月, 2010 1 次提交

block: remove REQ_HARDBARRIER · 02e031cb

由 Christoph Hellwig 提交于 11月 10, 2010

REQ_HARDBARRIER is dead now, so remove the leftovers.  What's left
at this point is:

 - various checks inside the block layer.
 - sanity checks in bio based drivers.
 - now unused bio_empty_barrier helper.
 - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
   but Xen really needs to sort out it's barrier situaton.
 - setting of ordered tags in uas - dead code copied from old scsi
   drivers.
 - scsi different retry for barriers - it's dead and should have been
   removed when flushes were converted to FS requests.
 - blktrace handling of barriers - removed.  Someone who knows blktrace
   better should add support for REQ_FLUSH and REQ_FUA, though.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

02e031cb

06 11月, 2010 2 次提交

watchdog: Fix section mismatch and potential undefined behavior. · 433039e9

由 David Daney 提交于 11月 05, 2010

Commit d9ca07a0 ("watchdog: Avoid kernel crash when disabling
watchdog") introduces a section mismatch.

Now that we reference no_watchdog from non-__init code it can no longer
be __initdata.
Signed-off-by: NDavid Daney <ddaney@caviumnetworks.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

433039e9

posix-cpu-timers: workaround to suppress the problems with mt exec · e0a70217

由 Oleg Nesterov 提交于 11月 05, 2010

posix-cpu-timers.c correctly assumes that the dying process does
posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD
timers from signal->cpu_timers list.

But, it also assumes that timer->it.cpu.task is always the group
leader, and thus the dead ->task means the dead thread group.

This is obviously not true after de_thread() changes the leader.
After that almost every posix_cpu_timer_ method has problems.

It is not simple to fix this bug correctly. First of all, I think
that timer->it.cpu should use struct pid instead of task_struct.
Also, the locking should be reworked completely. In particular,
tasklist_lock should not be used at all. This all needs a lot of
nontrivial and hard-to-test changes.

Change __exit_signal() to do posix_cpu_timers_exit_group() when
the old leader dies during exec. This is not the fix, just the
temporary hack to hide the problem for 2.6.37 and stable. IOW,
this is obviously wrong but this is what we currently have anyway:
cpu timers do not work after mt exec.

In theory this change adds another race. The exiting leader can
detach the timers which were attached to the new leader. However,
the window between de_thread() and release_task() is small, we
can pretend that sys_timer_create() was called before de_thread().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0a70217

05 11月, 2010 1 次提交

Clean up relay_alloc_page_array() slightly by using vzalloc rather than vmalloc and memset · 408af87a

由 Jesper Juhl 提交于 11月 04, 2010

We can optimize kernel/relay.c::relay_alloc_page_array() slightly by
using vzalloc.  The patch makes these changes:

 - use vzalloc instead of vmalloc+memset.
 - remove redundant local variable 'array'.
 - declare local 'pa_size' as const.

Cuts down nicely on both source and object-code size.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Acked-by: NPekka Enberg <penberg@kernel.org>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

408af87a

30 10月, 2010 12 次提交

audit mmap · 120a795d

由 Al Viro 提交于 10月 30, 2010

Normal syscall audit doesn't catch 5th argument of syscall.  It also
doesn't catch the contents of userland structures pointed to be
syscall argument, so for both old and new mmap(2) ABI it doesn't
record the descriptor we are mapping.  For old one it also misses
flags.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

120a795d

audit: Use rcu for task lookup protection · ab263f47

由 Thomas Gleixner 提交于 12月 09, 2009

Protect the task lookups in audit_receive_msg() with rcu_read_lock()
instead of tasklist_lock and use lock/unlock_sighand to protect
against the exit race.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ab263f47

audit: Do not send uninitialized data for AUDIT_TTY_GET · 20703205

由 Thomas Gleixner 提交于 12月 09, 2009

audit_receive_msg() sends uninitialized data for AUDIT_TTY_GET when
the task was not found.

Send reply only when task was found.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

20703205

audit: Call tty_audit_push_task() outside preempt disabled · 3c80fe4a

由 Thomas Gleixner 提交于 12月 09, 2009

While auditing all tasklist_lock read_lock sites I stumbled over the
following call chain:

audit_prepare_user_tty()
  read_lock(&tasklist_lock);
  tty_audit_push_task();
     mutex_lock(&buf->mutex);

     --> buf->mutex is locked with preemption disabled.

Solve this by acquiring a reference to the task struct under
rcu_read_lock and call tty_audit_push_task outside of the preempt
disabled region.

Move all code which needs to be protected by sighand lock into
tty_audit_push_task() and use lock/unlock_sighand as we do not hold
tasklist_lock.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3c80fe4a

A
in untag_chunk() we need to do alloc_chunk() a bit earlier · f7a998a9
由 Al Viro 提交于 10月 30, 2010
```
... while we are not holding spinlocks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f7a998a9

audit: make functions static · b8800aa5

由 Stephen Hemminger 提交于 10月 20, 2010

I was doing some namespace checks and found some simple stuff in
audit that could be cleaned up. Make some functions static, and
put const on make_reply payload arg.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b8800aa5

Audit: add support to match lsm labels on user audit messages · d29be158

由 Miloslav Trmac 提交于 9月 16, 2010

Add support for matching by security label (e.g. SELinux context) of
the sender of an user-space audit record.

The audit filter code already allows user space to configure such
filters, but they were ignored during evaluation.  This patch implements
evaluation of these filters.

For example, after application of this patch, PAM authentication logs
caused by cron can be disabled using
	auditctl -a user,never -F subj_type=crond_t
Signed-off-by: NMiloslav Trmac <mitr@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d29be158

debug_core,x86,blackfin: Clean up hw debug disable API · d7ba979d

由 Dongdong Deng 提交于 8月 18, 2010

The kgdb_disable_hw_debug() was an architecture specific function for
disabling all hardware breakpoints on a per cpu basis when entering
the debug core.

This patch will remove the weak function kdbg_disable_hw_debug() and
change it into a call back which lives with the rest of hw breakpoint
call backs in struct kgdb_arch.
Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

d7ba979d

kdb: Fix early debugging crash regression · 578bd4df

由 Jason Wessel 提交于 10月 29, 2010

The kdb_current legally be equal to NULL in the early boot of the x86
arch.  The problem pcan be observed by booting with the kernel arguments:

    earlyprintk=vga ekgdboc=kbd kgdbwait

The kdb shell will oops on entry and recursively fault because it
cannot get past the final stage of shell initialization.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

578bd4df

kdb: fix per_cpu command to remove supress mask · 931ea248

由 Jason Wessel 提交于 10月 29, 2010

Rusty pointed out that the per_cpu command uses up lots of space on
the stack and the cpu supress mask is probably not needed.

This patch removes the need for the supress mask as well as fixing up
the following problems with the kdb per_cpu command:
  * The per_cpu command should allow an address as an argument
  * When you have more data than can be displayed on one screen allow
    the user to break out of the print loop.
Reported-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

931ea248

jump label: Make arch_jump_label_text_poke_early() optional · 95bcd683

由 Steven Rostedt 提交于 10月 29, 2010

Some archs do not need to do anything special for jump labels on
startup (like MIPS).  This patch adds a weak function stub for
arch_jump_label_text_poke_early();

Cc: Jason Baron <jbaron@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Daney <ddaney@caviumnetworks.com>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1286218615-24011-2-git-send-email-ddaney@caviumnetworks.com>
LKML-Reference: <20101015201037.703989993@goodmis.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

95bcd683

jump label: Fix error with preempt disable holding mutex · de31c3ca

由 Steven Rostedt 提交于 10月 18, 2010

Kprobes and jump label were having a race between mutexes that
was fixed by reordering the jump label. But this reordering
moved the jump label mutex into a preempt disable location.

This patch does a little fiddling to move the grabbing of
the jump label mutex from inside the preempt disable section
and still keep the order correct between the mutex and the
kprobes lock.
Reported-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

de31c3ca

29 10月, 2010 1 次提交
- A
  convert cgroup and cpuset · f7e83571
  由 Al Viro 提交于 7月 26, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f7e83571
28 10月, 2010 12 次提交

jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex · 91bad2f8

由 Jason Baron 提交于 10月 01, 2010

register_kprobe() downs the 'text_mutex' and then calls
jump_label_text_reserved(), which downs the 'jump_label_mutex'.
However, the jump label code takes those mutexes in the reverse
order.

Fix by requiring the caller of jump_label_text_reserved() to do
the jump label locking via the newly added: jump_label_lock(),
jump_label_unlock(). Currently, kprobes is the only user
of jump_label_text_reserved().
Reported-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
LKML-Reference: <759032c48d5e30c27f0bba003d09bffa8e9f28bb.1285965957.git.jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

91bad2f8

jump label: Fix module __init section race · b842f8fa

由 Jason Baron 提交于 10月 01, 2010

Jump label uses is_module_text_address() to ensure that the module
__init sections are valid before updating them. However, between the
check for a valid module __init section and the subsequent jump
label update, the module's __init section could be freed out from under
us.

We fix this potential race by adding a notifier callback to the
MODULE_STATE_LIVE state. This notifier is called *after* the __init
section has been run but before it is going to be freed. In the
callback, the jump label code zeros the key value for any __init jump
code within the module, and we add a check for a non-zero key value when
we update jump labels. In this way we require no additional data
structures.

Thanks to Mathieu Desnoyers for pointing out this race condition.
Reported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
LKML-Reference: <c6f037b7598777668025ceedd9294212fd95fa34.1285965957.git.jbaron@redhat.com>

[ Renamed remove_module_init() to remove_jump_label_module_init()
  as suggested by Masami Hiramatsu. ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b842f8fa

Remove duplicate includes from many files · 61d8e11e

由 Zimny Lech 提交于 10月 27, 2010

Signed-off-by: NZimny Lech <napohybelskurwysynom2010@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61d8e11e

kernel/resource.c: handle reinsertion of an already-inserted resource · 5de1cb2d

由 Huang Shijie 提交于 10月 27, 2010

If the same resource is inserted to the resource tree (maybe not on
purpose), a dead loop will be created.  In this situation, The kernel does
not report any warning or error :(

  The command below will show a endless print.
  #cat /proc/iomem

[akpm@linux-foundation.org: add WARN_ON()]
Signed-off-by: NHuang Shijie <shijie8@gmail.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5de1cb2d

taskstats: use real microsecond granularity for CPU times · d57af9b2

由 Michael Holzheu 提交于 10月 27, 2010

The taskstats interface uses microsecond granularity for the user and
system time values.  The conversion from cputime to the taskstats values
uses the cputime_to_msecs primitive which effectively limits the
granularity to milliseconds.  Add the cputime_to_usecs primitive for
architectures that have better, more precise CPU time values.  Remove
cputime_to_msecs primitive because there are no more users left.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Cc: Luck Tony <tony.luck@intel.com>
Cc: Shailabh Nagar <nagar1234@in.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d57af9b2

taskstats: split fill_pid function · 3d9e0cf1

由 Michael Holzheu 提交于 10月 27, 2010

Separate the finding of a task_struct by pid or tgid from filling the
taskstats data. This makes the code more readable.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d9e0cf1

taskstats: separate taskstats commands · 93233125

由 Michael Holzheu 提交于 10月 27, 2010

Move each taskstats command into a single function.  This makes the code
more readable and makes it easier to add new commands.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93233125

delayacct: align to 8 byte boundary on 64-bit systems · 85893120

由 Jeff Mahoney 提交于 10月 27, 2010

prepare_reply() sets up an skb for the response.  The payload contains:

 +--------------------------------+
 | genlmsghdr - 4 bytes           |
 +--------------------------------+
 | NLA header - 4 bytes           | /* Aggregate header */
 +-+------------------------------+
 | | NLA header - 4 bytes         | /* PID header */
 | +------------------------------+
 | | pid/tgid   - 4 bytes         |
 | +------------------------------+
 | | NLA header - 4 bytes         | /* stats header */
 | + -----------------------------+ <- oops. aligned on 4 byte boundary
 | | struct taskstats - 328 bytes |
 +-+------------------------------+

The start of the taskstats struct must be 8 byte aligned on IA64 (and
other systems with 8 byte alignment rules for 64-bit types) or runtime
alignment warnings will be issued.

This patch pads the pid/tgid field out to sizeof(long), which forces the
alignment of taskstats.  The getdelays userspace code is ok with this
since it assumes 32-bit pid/tgid and then honors that header's length
field.

An array is used to avoid exposing kernel memory contents to userspace in
the response.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85893120

/proc/stat: fix scalability of irq sum of all cpu · 478735e3

由 KAMEZAWA Hiroyuki 提交于 10月 27, 2010

In /proc/stat, the number of per-IRQ event is shown by making a sum each
irq's events on all cpus.  But we can make use of kstat_irqs().

kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
it's not a big cost. (Both of the number of cpus and irqs are small.)

If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

	for_each_irq()
		for_each_cpu()
			- look up a radix tree
			- read desc->irq_stat[cpu]
This seems not efficient. This patch adds kstat_irqs() for
CONFIG_GENRIC_HARDIRQ and change the calculation as

	for_each_irq()
		look up radix tree
		for_each_cpu()
			- read desc->irq_stat[cpu]

This reduces cost.

A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

%time cat /proc/stat > /dev/null

Before Patch:	 2.459 sec
After Patch :	  .561 sec

[akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
[akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NJack Steiner <steiner@sgi.com>
Acked-by: NJack Steiner <steiner@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

478735e3

exit: add lock context annotation on find_new_reaper() · d16e15f5

由 Namhyung Kim 提交于 10月 27, 2010

find_new_reaper() releases and regrabs tasklist_lock but was missing
proper annotations.  Add it.  This remove following sparse warning:

 warning: context imbalance in 'find_new_reaper' - unexpected unlock
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d16e15f5

signals: move cred_guard_mutex from task_struct to signal_struct · 9b1bf12d

由 KOSAKI Motohiro 提交于 10月 27, 2010

Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
itself and we can reuse ->cred_guard_mutex for it.  Yes, concurrent
execve() has no worth.

Let's move ->cred_guard_mutex from task_struct to signal_struct.  It
naturally prevent multiple-threads-inside-exec.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b1bf12d

signals: annotate lock context change on ptrace_stop() · b8401150

由 Namhyung Kim 提交于 10月 27, 2010

ptrace_stop() releases and regrabs current->sighand->siglock but was
missing proper annotation.  Add it.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8401150