提交 · a7295898a1d2e501427f557111c2b4bdfc90b1ed · openeuler / raspberrypi-kernel

04 8月, 2011 2 次提交

taskstats: add_del_listener() should ignore !valid listeners · a7295898

由 Oleg Nesterov 提交于 8月 03, 2011

When send_cpu_listeners() finds the orphaned listener it marks it as
!valid and drops listeners->sem.  Before it takes this sem for writing,
s->pid can be reused and add_del_listener() can wrongly try to re-use
this entry.

Change add_del_listener() to check ->valid = T.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7295898

taskstats: add_del_listener() shouldn't use the wrong node · dfc428b6

由 Oleg Nesterov 提交于 8月 03, 2011

1. Commit 26c4caea "don't allow duplicate entries in listener mode"
   changed add_del_listener(REGISTER) so that "next_cpu:" can reuse the
   listener allocated for the previous cpu, this doesn't look exactly
   right even if minor.

   Change the code to kfree() in the already-registered case, this case
   is unlikely anyway so the extra kmalloc_node() shouldn't hurt but
   looke more correct and clean.

2. use the plain list_for_each_entry() instead of _safe() to scan
   listeners->list.

3. Remove the unneeded INIT_LIST_HEAD(&s->list), we are going to
   list_add(&s->list).
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfc428b6

02 8月, 2011 4 次提交

kdb,kgdb: Allow arbitrary kgdb magic knock sequences · 37f86b46

由 Jason Wessel 提交于 5月 24, 2011

The first packet that gdb sends when the kernel is in kdb mode seems
to change with every release of gdb.  Instead of continuing to add
many different gdb packets, change kdb to automatically look for any
thing that looks like a gdb packet.

Example 1 cold start test:
echo g > /proc/sysrq-trigger
$D#44+

Example 2 cold start test:
echo g > /proc/sysrq-trigger
$3#33

The second one should re-enter kdb's shell right away and is purely a
test.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

37f86b46

kdb: Remove all references to DOING_KGDB2 · d613d828

由 Jason Wessel 提交于 5月 23, 2011

The DOING_KGDB2 was originally a state variable for one of the two
ways to automatically transition from kdb to kgdb. Purge all these
variables and just use one single state for the transition.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

d613d828

kdb,kgdb: Implement switch and pass buffer from kdb -> gdb · f679c498

由 Jason Wessel 提交于 5月 23, 2011

When switching from kdb mode to kgdb mode packets were getting lost
depending on the size of the fifo queue of the serial chip.  When gdb
initially connects if it is in kdb mode it should entirely send any
character buffer over to the gdbstub when switching connections.

Previously kdb was zero'ing out the character buffer and this could
lead to gdb failing to connect at all, or a lengthy pause could occur
on the initial connect.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

f679c498

kdb: cleanup unused variables missed in the original kdb merge · 3bdb65ec

由 Jason Wessel 提交于 6月 30, 2011

The BTARGS and BTSYMARG variables do not have any function in the
mainline version of kdb.
Reported-by: NTim Bird <tim.bird@am.sony.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

3bdb65ec

31 7月, 2011 1 次提交

resources: Add lookup_resource() · 1c388919

由 Geert Uytterhoeven 提交于 5月 07, 2011

Add a function to find an existing resource by a resource start address.
This allows to implement simple allocators (with a malloc/free-alike API)
on top of the resource system.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>

1c388919

28 7月, 2011 4 次提交

dt/irq: add irq_domain_generate_simple() helper · 7e713301

由 Grant Likely 提交于 7月 26, 2011

irq_domain_generate_simple() is an easy way to generate an irq translation
domain for simple irq controllers.  It assumes a flat 1:1 mapping from
hardware irq number to an offset of the first linux irq number assigned
to the controller
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

7e713301

irq: add irq_domain translation infrastructure · 08a543ad

由 Grant Likely 提交于 7月 26, 2011

This patch adds irq_domain infrastructure for translating from
hardware irq numbers to linux irqs.  This is particularly important
for architectures adding device tree support because the current
implementation (excluding PowerPC and SPARC) cannot handle
translation for more than a single interrupt controller.  irq_domain
supports device tree translation for any number of interrupt
controllers.

This patch converts x86, Microblaze, ARM and MIPS to use irq_domain
for device tree irq translation.  x86 is untested beyond compiling it,
irq_domain is enabled for MIPS and Microblaze, but the old behaviour is
preserved until the core code is modified to actually register an
irq_domain yet.  On ARM it works and is required for much of the new
ARM device tree board support.

PowerPC has /not/ been converted to use this new infrastructure.  It
is still missing some features before it can replace the virq
infrastructure already in powerpc (see documentation on
irq_domain_map/unmap for details).  Followup patches will add the
missing pieces and migrate PowerPC to use irq_domain.

SPARC has its own method of managing interrupts from the device tree
and is unaffected by this change.
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

08a543ad

[media] v4l2-compat-ioctl32: add VIDIOC_DQEVENT support · 2330fb82

由 Hans Verkuil 提交于 6月 07, 2011

Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

2330fb82

signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked() · c1095c6d

由 Oleg Nesterov 提交于 7月 27, 2011

sys_ssetmask(), sys_rt_sigsuspend() and compat_sys_rt_sigsuspend()
change ->blocked directly.  This is not correct, see the changelog in
e6fa16ab "signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Another change is that now we are doing ->saved_sigmask = ->blocked
lockless, it doesn't make any sense to do this under ->siglock.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1095c6d

27 7月, 2011 6 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

panic: panic=-1 for immediate reboot · 4302fbc8

由 Hugh Dickins 提交于 7月 26, 2011

When a kernel BUG or oops occurs, ChromeOS intends to panic and
immediately reboot, with stacktrace and other messages preserved in RAM
across reboot.

But the longer we delay, the more likely the user is to poweroff and
lose the info.

panic_timeout (seconds before rebooting) is set by panic= boot option or
sysctl or /proc/sys/kernel/panic; but 0 means wait forever, so at
present we have to delay at least 1 second.

Let a negative number mean reboot immediately (with the small cosmetic
benefit of suppressing that newline-less "Rebooting in %d seconds.."
message).
Signed-off-by: NHugh Dickins <hughd@chromium.org>
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4302fbc8

gcov: disable CONSTRUCTORS for UML · 947be5df

由 Vitaliy Ivanov 提交于 7月 26, 2011

Selecting GCOV for UML causing configuration mismatch:

  warning: (GCOV_KERNEL) selects CONSTRUCTORS which has unmet direct dependencies (!UML)

Constructors are not needed for UML.
Signed-off-by: NVitaliy Ivanov <vitalivanov@gmail.com>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: NRichard Weinberger <richard@nod.at>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

947be5df

ipc: introduce shm_rmid_forced sysctl · b34a6b1d

由 Vasiliy Kulikov 提交于 7月 26, 2011

Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
memory objects in current ipc namespace will be automatically forced to
use IPC_RMID.

The POSIX way of handling shmem allows one to create shm objects and
call shmdt(), leaving shm object associated with no process, thus
consuming memory not counted via rlimits.

With shm_rmid_forced=1 the shared memory object is counted at least for
one process, so OOM killer may effectively kill the fat process holding
the shared memory.

It obviously breaks POSIX - some programs relying on the feature would
stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
"orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
reasons.

The feature was previously impemented in -ow as a configure option.

[akpm@linux-foundation.org: fix documentation, per Randy]
[akpm@linux-foundation.org: fix warning]
[akpm@linux-foundation.org: readability/conventionality tweaks]
[akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Solar Designer <solar@openwall.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b34a6b1d

kernel/fork.c: fix a few coding style issues · fb0a685c

由 Daniel Rebelo de Oliveira 提交于 7月 26, 2011

Signed-off-by: NDaniel Rebelo de Oliveira <psykon@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb0a685c

cpusets: randomize node rotor used in cpuset_mem_spread_node() · 778d3b0f

由 Michal Hocko 提交于 7月 26, 2011

[ This patch has already been accepted as commit 0ac0c0d0 but later
  reverted (commit 35926ff5) because it itroduced arch specific
  __node_random which was defined only for x86 code so it broke other
  archs.  This is a followup without any arch specific code.  Other than
  that there are no functional changes.]

Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is
that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
at node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number
of the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
[mhocko@suse.cz: Make it arch independent]
[akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
Signed-off-by: NJack Steiner <steiner@sgi.com>
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Menage <menage@google.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

778d3b0f

26 7月, 2011 4 次提交

kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n · 626a0312

由 Stephen Boyd 提交于 7月 25, 2011

If CONFIG_IKCONFIG=m but CONFIG_IKCONFIG_PROC=n we get a module that has
no MODULE_LICENSE definition.  Move the MODULE_*() definitions outside the
CONFIG_IKCONFIG_PROC #ifdef to prevent this configuration from tainting
the kernel.
Signed-off-by: NStephen Boyd <bebarino@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

626a0312

notifiers: sys: move reboot notifiers into reboot.h · c5f41752

由 Amerigo Wang 提交于 7月 25, 2011

It is not necessary to share the same notifier.h.

This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.

[amwang@redhat.com: make allyesconfig succeed on ppc64]
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5f41752

devres: fix possible use after free · ae891a1b

由 Maxin B John 提交于 7月 25, 2011

devres uses the pointer value as key after it's freed, which is safe but
triggers spurious use-after-free warnings on some static analysis tools.
Rearrange code to avoid such warnings.
Signed-off-by: NMaxin B. John <maxin.john@gmail.com>
Reviewed-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae891a1b

mm/futex: fix futex writes on archs with SW tracking of dirty & young · 2efaca92

由 Benjamin Herrenschmidt 提交于 7月 25, 2011

I haven't reproduced it myself but the fail scenario is that on such
machines (notably ARM and some embedded powerpc), if you manage to hit
that futex path on a writable page whose dirty bit has gone from the PTE,
you'll livelock inside the kernel from what I can tell.

It will go in a loop of trying the atomic access, failing, trying gup to
"fix it up", getting succcess from gup, go back to the atomic access,
failing again because dirty wasn't fixed etc...

So I think you essentially hang in the kernel.

The scenario is probably rare'ish because affected architecture are
embedded and tend to not swap much (if at all) so we probably rarely hit
the case where dirty is missing or young is missing, but I think Shan has
a piece of SW that can reliably reproduce it using a shared writable
mapping & fork or something like that.

On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in the
PTE.

Additionally, some architectures might lazily flush the TLB when relaxing
write protection (by doing only a local flush), and expect a fault to
invalidate the stale entry if it's still present on another processor.

The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
"fix it up" by causing get_user_pages() which would then be equivalent to
taking the fault.

However that isn't the case.  get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state.  It will eventually
update those bits ...  in the struct page, but not in the PTE.

Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.

Basically, gup is the wrong interface for the job.  The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.

The futex code currently attempts to write to user memory within a
pagefault disabled section, and if that fails, tries to fix it up using
get_user_pages().

This doesn't work on archs where the dirty and young bits are maintained
by software, since they will gate access permission in the TLB, and will
not be updated by gup().

In addition, there's an expectation on some archs that a spurious write
fault triggers a local TLB flush, and that is missing from the picture as
well.

I decided that adding those "features" to gup() would be too much for this
already too complex function, and instead added a new simpler
fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
which the futex code can call.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: NShan Hai <haishan.bai@gmail.com>
Tested-by: NShan Hai <haishan.bai@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <darren.hart@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2efaca92

24 7月, 2011 4 次提交

module: add /sys/module/<name>/uevent files · 88bfa324

由 Kay Sievers 提交于 7月 24, 2011

Userspace wants to manage module parameters with udev rules.
This currently only works for loaded modules, but not for
built-in ones.

To allow access to the built-in modules we need to
re-trigger all module load events that happened before any
userspace was running. We already do the same thing for all
devices, subsystems(buses) and drivers.

This adds the currently missing /sys/module/<name>/uevent files
to all module entries.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split & trivial fix)

88bfa324

module: change attr callbacks to take struct module_kobject · 4befb026

由 Kay Sievers 提交于 7月 24, 2011

This simplifies the next patch, where we have an attribute on a
builtin module (ie. module == NULL).
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split into 2)

4befb026

modules: add default loader hook implementations · 74e08fcf

由 Jonas Bonn 提交于 6月 30, 2011

The module loader code allows architectures to hook into the code by
providing a small number of entry points that each arch must implement.
This patch provides __weakly linked generic implementations of these
entry points for architectures that don't need to do anything special.
Signed-off-by: NJonas Bonn <jonas@southpole.se>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

74e08fcf

param: fix return value handling in param_set_* · 81c74136

由 Satoru Moriya 提交于 5月 26, 2011

In STANDARD_PARAM_DEF, param_set_* handles the case in which strtolfn
returns -EINVAL but it may return -ERANGE. If it returns -ERANGE,
param_set_* may set uninitialized value to the paramerter. We should handle
both cases.

The one of the cases in which strtolfn() returns -ERANGE is following:

 *Type of module parameter is long
 *Set the parameter more than LONG_MAX
Signed-off-by: NSatoru Moriya <satoru.moriya@hds.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

81c74136

22 7月, 2011 12 次提交

sched: Cleanup duplicate local variable in [enqueue|dequeue]_task_fair · 0f317143

由 Lin Ming 提交于 7月 22, 2011

No need to define a new "cfs_rq" variable in the "for" block.
Just use the one at the top of the function.
Signed-off-by: NLin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311297271.3938.1352.camel@minggr.sh.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

0f317143

lockdep: Fix lockdep_no_validate against IRQ states · efbe2eee

由 Peter Zijlstra 提交于 7月 07, 2011

Thomas noticed that a lock marked with lockdep_set_novalidate_class()
will still trigger warnings for IRQ inversions. Cure this by skipping
those when marking irq state.
Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2dp5vmpsxeraqm42kgww6ge2@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

efbe2eee

perf: Remove perf_event_attr::type check · 9985c20f

由 Lin Ming 提交于 6月 30, 2011

PMU type id can be allocated dynamically, so perf_event_attr::type check
when copying attribute from userspace to kernel is not valid.
Signed-off-by: NLin Ming <ming.m.lin@intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309421396-17438-4-git-send-email-ming.m.lin@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9985c20f

sched: Replace use of entity_key() · 2bd2d6f2

由 Stephan Baerwolf 提交于 7月 20, 2011

"entity_key()" is only used in "__enqueue_entity()" and
its only function is to subtract a tasks vruntime by
its groups minvruntime.
Before this patch a rbtree enqueue-decision is done by
comparing two tasks in the style:

	"if (entity_key(cfs_rq, se) < entity_key(cfs_rq, entry))"

which would be

	"if (se->vruntime-cfs_rq->min_vruntime < entry->vruntime-cfs_rq->min_vruntime)"

or (if reducing cfs_rq->min_vruntime out)

	"if (se->vruntime < entry->vruntime)"

which is

	"if (entity_before(se, entry))"

So we do not need "entity_key()".
If "entity_before()" is inline we will also save one subtraction (only one,
because "entity_key(cfs_rq, se)"  was cached in "key")
Signed-off-by: NStephan Baerwolf <stephan.baerwolf@tu-ilmenau.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-ns12mnd2h5w8rb9agd8hnsfk@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

2bd2d6f2

sched: Separate group-scheduling code more clearly · acb5a9ba

由 Jan H. Schönherr 提交于 7月 14, 2011

Clean up cfs/rt runqueue initialization by moving group scheduling
related code into the corresponding functions.

Also, keep group scheduling as an add-on, so that things are only done
additionally, i. e. remove the init_*_rq() calls from init_tg_*_entry().
(This removes a redundant initalization during sched_init()).

In case of group scheduling rt_rq->highest_prio.curr is now initialized
twice, but adding another #ifdef seems not worth it.
Signed-off-by: NJan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1310661163-16606-1-git-send-email-schnhrr@cs.tu-berlin.deSigned-off-by: NIngo Molnar <mingo@elte.hu>

acb5a9ba

sched: Reorder root_domain to remove 64 bit alignment padding · 26a148eb

由 Richard Kennedy 提交于 7月 15, 2011

Reorder root_domain to remove 8 bytes of alignment padding on 64 bit
builds, this shrinks the size from 1736 to 1728 bytes, therefore using
one fewer cachelines.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1310726492.1977.5.camel@castor.rskSigned-off-by: NIngo Molnar <mingo@elte.hu>

26a148eb

sched: Do not attempt to destroy uninitialized rt_bandwidth · 99bc5242

由 Bianca Lutz 提交于 7月 13, 2011

If a task group is to be created and alloc_fair_sched_group() fails,
then the rt_bandwidth of the corresponding task group is not yet
initialized. The caller, sched_create_group(), starts a clean up
procedure which calls free_rt_sched_group() which unconditionally
destroys the not yet initialized rt_bandwidth.

This crashes or hangs the system in lock_hrtimer_base(): UP systems
dereference a NULL pointer, while SMP systems loop endlessly on a
condition that cannot become true.

This patch simply avoids the destruction of rt_bandwidth when the
initialization code path was not reached.

(This was discovered by accident with a custom kernel modification.)
Signed-off-by: NBianca Lutz <sowilo@cs.tu-berlin.de>
Signed-off-by: NJan Schoenherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1310580816-10861-7-git-send-email-schnhrr@cs.tu-berlin.deSigned-off-by: NIngo Molnar <mingo@elte.hu>

99bc5242

sched: Remove unused function cpu_cfs_rq() · 045176d2

由 Jan Schoenherr 提交于 7月 13, 2011

The last reference to cpu_cfs_rq() was removed with commit 88ec22d3
("sched: Remove the cfs_rq dependency from set_task_cpu()"). Thus,
remove this function, too.
Signed-off-by: NJan Schoenherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1310580816-10861-3-git-send-email-schnhrr@cs.tu-berlin.deSigned-off-by: NIngo Molnar <mingo@elte.hu>

045176d2

sched: Fix (harmless) typo 'CONFG_FAIR_GROUP_SCHED' · 5f817d67

由 Jan Schoenherr 提交于 7月 13, 2011

This patch fixes a typo located in a comment.
Signed-off-by: NJan Schoenherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1310580816-10861-2-git-send-email-schnhrr@cs.tu-berlin.deSigned-off-by: NIngo Molnar <mingo@elte.hu>

5f817d67

sched, cgroup: Optimize load_balance_fair() · 9763b67f

由 Peter Zijlstra 提交于 7月 13, 2011

Use for_each_leaf_cfs_rq() instead of list_for_each_entry_rcu(), this
achieves that load_balance_fair() only iterates those task_groups that
actually have tasks on busiest, and that we iterate bottom-up, trying to
move light groups before the heavier ones.

No idea if it will actually work out to be beneficial in practice, does
anybody have a cgroup workload that might show a difference one way or
the other?

[ Also move update_h_load to sched_fair.c, loosing #ifdef-ery ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NPaul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/1310557009.2586.28.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

9763b67f

sched: Don't update shares twice on on_rq parent · 9598c82d

由 Paul Turner 提交于 7月 06, 2011

In dequeue_task_fair() we bail on dequeue when we encounter a parenting entity
with additional weight. However, we perform a double shares update on this
entity as we continue the shares update traversal from this point, despite
dequeue_entity() having already updated its queuing cfs_rq.
Avoid this by starting from the parent when we resume.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110707053059.797714697@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9598c82d

sched: update correct entity's runtime in check_preempt_wakeup() · 9bbd7374

由 Paul Turner 提交于 7月 05, 2011

While looking at check_preempt_wakeup() I realized that we are
potentially updating the wrong entity in the fair-group scheduling
case. In this case the current task's cfs_rq may not be the same as
the one used for the comparison between the waking task and the
existing task's vruntime.

This potentially results in us using a stale vruntime in the
pre-emption decision, providing a small false preference for the
previous task. The effects of this are bounded since we always
perform a hierarchal update on the tick.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/CAPM31R+2Ke2urUZKao5W92_LupdR4AYEv-EZWiJ3tG=tEes2cw@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9bbd7374

21 7月, 2011 3 次提交

ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction · 8a352418

由 Oleg Nesterov 提交于 7月 21, 2011

Simple test-case,

	int main(void)
	{
		int pid, status;

		pid = fork();
		if (!pid) {
			pause();
			assert(0);
			return 0x23;
		}

		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
		assert(wait(&status) == pid);
		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);

		kill(pid, SIGCONT);	// <--- also clears STOP_DEQUEUD

		assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
		assert(wait(&status) == pid);
		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGCONT);

		assert(ptrace(PTRACE_CONT, pid, 0, SIGSTOP) == 0);
		assert(wait(&status) == pid);
		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);

		kill(pid, SIGKILL);
		return 0;
	}

Without the patch it hangs. After the patch SIGSTOP "injected" by the
tracer is not ignored and stops the tracee.

Note also that if this test-case uses, say, SIGWINCH instead of SIGCONT,
everything works without the patch. This can't be right, and this is
confusing.

The problem is that SIGSTOP (or any other sig_kernel_stop() signal) has
no effect without JOBCTL_STOP_DEQUEUED. This means it is simply ignored
after PTRACE_CONT unless JOBCTL_STOP_DEQUEUED was set "by accident", say
it wasn't cleared after initial SIGSTOP sent by PTRACE_ATTACH.

At first glance we could change ptrace_signal() to add STOP_DEQUEUED
after return from ptrace_stop(), but this is not right in case when the
tracer does not change the reported SIGSTOP and SIGCONT comes in between.
This is even more wrong with PT_SEIZED, SIGCONT adds JOBCTL_TRAP_NOTIFY
which will be "lost" during the TRAP_STOP | TRAP_NOTIFY report.

So lets add STOP_DEQUEUED _before_ we report the signal. It has no effect
unless sig_kernel_stop() == T after the tracer resumes us, and in the
latter case the pending STOP_DEQUEUED means no SIGCONT in between, we
should stop.

Note also that if SIGCONT was sent, PT_SEIZED tracee will correctly
report PTRACE_EVENT_STOP/SIGTRAP and thus the tracer can notice the fact
SIGSTOP was cancelled.

Also, move the current->ptrace check from ptrace_signal() to its caller,
get_signal_to_deliver(), this looks more natural.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>

8a352418

rw_semaphore: remove up/down_read_non_owner · 11b80f45

由 Christoph Hellwig 提交于 6月 24, 2011

Now that the last users is gone these can be removed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

11b80f45

time: Fix stupid KERN_WARN compile issue · cbaa5152

由 John Stultz 提交于 7月 20, 2011

Terribly embarassing. Don't know how I committed this, but its
KERN_WARNING not KERN_WARN.

This fixes the following compile error:
kernel/time/timekeeping.c: In function ‘__timekeeping_inject_sleeptime’:
kernel/time/timekeeping.c:608: error: ‘KERN_WARN’ undeclared (first use in this function)
kernel/time/timekeeping.c:608: error: (Each undeclared identifier is reported only once
kernel/time/timekeeping.c:608: error: for each function it appears in.)
kernel/time/timekeeping.c:608: error: expected ‘)’ before string constant
make[2]: *** [kernel/time/timekeeping.o] Error 1
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

cbaa5152