提交 · 4440095c8268c1a5e11577097d2be429cec036ca · OpenHarmony / kernel_linux

24 12月, 2009 1 次提交

SYSCTL: Print binary sysctl warnings (nearly) only once · 4440095c

由 Andi Kleen 提交于 12月 23, 2009

When printing legacy sysctls print the warning message
for each of them only once.  This way there is a guarantee
the syslog won't be flooded for any sane program.

The original attempt at this made the tables non const and stored
the flag inline.

Linus suggested using a separate hash table for this, this is based on a
code snippet from him.

The hash implies this is not exact and can sometimes not print a
new sysctl due to a hash collision, but in practice this should not
be a problem

I used a FNV32 hash over the binary string with a 32byte bitmap. This
gives relatively little collisions when all the predefined binary sysctls
are hashed:

size 256
bucket
length      number
0:          [25]
1:          [67]
2:          [88]
3:          [47]
4:          [22]
5:          [6]
6:          [1]

The worst case is a single collision of 6 hash values.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

4440095c

23 12月, 2009 9 次提交

kfifo: add record handling functions · 86d48803

由 Stefani Seibold 提交于 12月 21, 2009

Add kfifo_in_rec() - puts some record data into the FIFO
 Add kfifo_out_rec() - gets some record data from the FIFO
 Add kfifo_from_user_rec() - puts some data from user space into the FIFO
 Add kfifo_to_user_rec() - gets data from the FIFO and write it to user space
 Add kfifo_peek_rec() - gets the size of the next FIFO record field
 Add kfifo_skip_rec() - skip the next fifo out record
 Add kfifo_avail_rec() - determinate the number of bytes available in a record FIFO
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

86d48803

kfifo: add kfifo_skip, kfifo_from_user and kfifo_to_user · a121f24a

由 Stefani Seibold 提交于 12月 21, 2009

Add kfifo_reset_out() for save lockless discard the fifo output
 Add kfifo_skip() to skip a number of output bytes
 Add kfifo_from_user() to copy user space data into the fifo
 Add kfifo_to_user() to copy fifo data to user space
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a121f24a

kfifo: rename kfifo_put... into kfifo_in... and kfifo_get... into kfifo_out... · 7acd72eb

由 Stefani Seibold 提交于 12月 21, 2009

rename kfifo_put...  into kfifo_in...  to prevent miss use of old non in
kernel-tree drivers

ditto for kfifo_get...  -> kfifo_out...

Improve the prototypes of kfifo_in and kfifo_out to make the kerneldoc
annotations more readable.

Add mini "howto porting to the new API" in kfifo.h
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7acd72eb

kfifo: cleanup namespace · e64c026d

由 Stefani Seibold 提交于 12月 21, 2009

change name of __kfifo_* functions to kfifo_*, because the prefix __kfifo
should be reserved for internal functions only.
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e64c026d

kfifo: move out spinlock · c1e13f25

由 Stefani Seibold 提交于 12月 21, 2009

Move the pointer to the spinlock out of struct kfifo.  Most users in
tree do not actually use a spinlock, so the few exceptions now have to
call kfifo_{get,put}_locked, which takes an extra argument to a
spinlock.
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1e13f25

kfifo: move struct kfifo in place · 45465487

由 Stefani Seibold 提交于 12月 21, 2009

This is a new generic kernel FIFO implementation.

The current kernel fifo API is not very widely used, because it has to
many constrains.  Only 17 files in the current 2.6.31-rc5 used it.
FIFO's are like list's a very basic thing and a kfifo API which handles
the most use case would save a lot of development time and memory
resources.

I think this are the reasons why kfifo is not in use:

 - The API is to simple, important functions are missing
 - A fifo can be only allocated dynamically
 - There is a requirement of a spinlock whether you need it or not
 - There is no support for data records inside a fifo

So I decided to extend the kfifo in a more generic way without blowing up
the API to much.  The new API has the following benefits:

 - Generic usage: For kernel internal use and/or device driver.
 - Provide an API for the most use case.
 - Slim API: The whole API provides 25 functions.
 - Linux style habit.
 - DECLARE_KFIFO, DEFINE_KFIFO and INIT_KFIFO Macros
 - Direct copy_to_user from the fifo and copy_from_user into the fifo.
 - The kfifo itself is an in place member of the using data structure, this save an
   indirection access and does not waste the kernel allocator.
 - Lockless access: if only one reader and one writer is active on the fifo,
   which is the common use case, no additional locking is necessary.
 - Remove spinlock - give the user the freedom of choice what kind of locking to use if
   one is required.
 - Ability to handle records. Three type of records are supported:
   - Variable length records between 0-255 bytes, with a record size
     field of 1 bytes.
   - Variable length records between 0-65535 bytes, with a record size
     field of 2 bytes.
   - Fixed size records, which no record size field.
 - Preserve memory resource.
 - Performance!
 - Easy to use!

This patch:

Since most users want to have the kfifo as part of another object,
reorganize the code to allow including struct kfifo in another data
structure.  This requires changing the kfifo_alloc and kfifo_init
prototypes so that we pass an existing kfifo pointer into them.  This
patch changes the implementation and all existing users.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45465487

Revert "time: Remove xtime_cache" · 83f57a11

由 Linus Torvalds 提交于 12月 22, 2009

This reverts commit 7bc7d637, as
requested by John Stultz. Quoting John:

 "Petr Titěra reported an issue where he saw odd atime regressions with
  2.6.33 where there were a full second worth of nanoseconds in the
  nanoseconds field.

  He also reviewed the time code and narrowed down the problem: unhandled
  overflow of the nanosecond field caused by rounding up the
  sub-nanosecond accumulated time.

  Details:

   * At the end of update_wall_time(), we currently round up the
  sub-nanosecond portion of accumulated time when storing it into xtime.
  This was added to avoid time inconsistencies caused when the
  sub-nanosecond portion was truncated when storing into xtime.
  Unfortunately we don't handle the possible second overflow caused by
  that rounding.

   * Previously the xtime_cache code hid this overflow by normalizing the
  xtime value when storing into the xtime_cache.

   * We could try to handle the second overflow after the rounding up, but
  since this affects the timekeeping's internal state, this would further
  complicate the next accumulation cycle, causing small errors in ntp
  steering. As much as I'd like to get rid of it, the xtime_cache code is
  known to work.

   * The correct fix is really to include the sub-nanosecond portion in the
  timekeeping accessor function, so we don't need to round up at during
  accumulation. This would greatly simplify the accumulation code.
  Unfortunately, we can't do this safely until the last three
  non-GENERIC_TIME arches (sparc32, arm, cris) are converted  (those
  patches are in -mm) and we kill off the spots where arches set xtime
  directly. This is all 2.6.34 material, so I think reverting the
  xtime_cache change is the best approach for now.

  Many thanks to Petr for both reporting and finding the issue!"
Reported-by: NPetr Titěra <P.Titera@century.cz>
Requested-by: Njohn stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83f57a11

Sanitize f_flags helpers · 5300990c

由 Al Viro 提交于 12月 19, 2009

* pull ACC_MODE to fs.h; we have several copies all over the place
* nightmarish expression calculating f_mode by f_flags deserves a helper
too (OPEN_FMODE(flags))
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5300990c

anonfd: Allow making anon files read-only · 628ff7c1

由 Roland Dreier 提交于 12月 18, 2009

It seems a couple places such as arch/ia64/kernel/perfmon.c and
drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile()
instead of a private pseudo-fs + alloc_file(), if only there were a way
to get a read-only file.  So provide this by having anon_inode_getfile()
create a read-only file if we pass O_RDONLY in flags.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

628ff7c1

22 12月, 2009 1 次提交

resources: fix call to alignf() in allocate_resource() · 0e2c8b8f

由 Dominik Brodowski 提交于 12月 20, 2009

The second parameter to alignf() in allocate_resource() must
reflect what new resource is attempted to be allocated, else
functions like pcibios_align_resource() (at least on x86) or
pcmcia_align() can't work correctly.

Commit 1e5ad967 broke this by
setting the "new" resource until we're about to return success.
To keep the resource untouched when allocate_resource() fails,
a "tmp" resource is introduced.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
Acked-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e2c8b8f

21 12月, 2009 2 次提交

sched: Fix hotplug hang · 70f11205

由 Peter Zijlstra 提交于 12月 20, 2009

The hot-unplug kstopmachine usage does a wakeup after
deactivating the cpu, hence we cannot use cpu_active()
here but must rely on the good olde online.
Reported-by: NSachin Sant <sachinp@in.ibm.com>
Reported-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NJens Axboe <jens.axboe@oracle.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <1261326987.4314.24.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

70f11205

sched: Restore printk sanity · 3df0fc5b

由 Peter Zijlstra 提交于 12月 20, 2009

Revert the braindead pr_* crap. (Commit 663997d4 "sched: Use
pr_fmt() and pr_<level>()")

It's dumb and causes stupid "sched: " strings all over the place.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NMike Galbraith <efault@gmx.de>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1261315437.4314.6.camel@laptop>
[ i dont mind the pr_*() patterns that much - but Peter dislikes them with a vengence. ]
[ - v2: remove spurious diffstat from changelog :-/ ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3df0fc5b

20 12月, 2009 2 次提交

fix more leaks in audit_tree.c tag_chunk() · b4c30aad

由 Al Viro 提交于 12月 19, 2009

Several leaks in audit_tree didn't get caught by commit
318b6d3d, including the leak on normal
exit in case of multiple rules refering to the same chunk.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4c30aad

fix braindamage in audit_tree.c untag_chunk() · 6f5d5114

由 Al Viro 提交于 12月 19, 2009

... aka "Al had badly fscked up when writing that thing and nobody
noticed until Eric had fixed leaks that used to mask the breakage".

The function essentially creates a copy of old array sans one element
and replaces the references to elements of original (they are on cyclic
lists) with those to corresponding elements of new one.  After that the
old one is fair game for freeing.

First of all, there's a dumb braino: when we get to list_replace_init we
use indices for wrong arrays - position in new one with the old array
and vice versa.

Another bug is more subtle - termination condition is wrong if the
element to be excluded happens to be the last one.  We shouldn't go
until we fill the new array, we should go until we'd finished the old
one.  Otherwise the element we are trying to kill will remain on the
cyclic lists...

That crap used to be masked by several leaks, so it was not quite
trivial to hit.  Eric had fixed some of those leaks a while ago and the
shit had hit the fan...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f5d5114

18 12月, 2009 3 次提交

printk: fix new kernel-doc warnings · 6485536b

由 Randy Dunlap 提交于 12月 17, 2009

Fix kernel-doc warnings in printk.c:

Warning(kernel/printk.c:1422): No description found for parameter 'dumper'
Warning(kernel/printk.c:1422): Excess function parameter 'dump' description in 'kmsg_dump_register'
Warning(kernel/printk.c:1451): No description found for parameter 'dumper'
Warning(kernel/printk.c:1451): Excess function parameter 'dump' description in 'kmsg_dump_unregister'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6485536b

do_wait() optimization: do not place sub-threads on task_struct->children list · 9cd80bbb

由 Oleg Nesterov 提交于 12月 17, 2009

Thanks to Roland who pointed out de_thread() issues.

Currently we add sub-threads to ->real_parent->children list.  This buys
nothing but slows down do_wait().

With this patch ->children contains only main threads (group leaders).
The only complication is that forget_original_parent() should iterate over
sub-threads by hand, and de_thread() needs another list_replace() when it
changes ->group_leader.

Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
tasks, we can remove this check (and we can unify do_wait_thread() and
ptrace_do_wait()).

This change can confuse the optimistic search in mm_update_next_owner(),
but this is fixable and minor.

Perhaps badness() and oom_kill_process() should be updated, but they
should be fixed in any case.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ratan Nalumasu <rnalumasu@gmail.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cd80bbb

kernel/sysctl.c: fix the incomplete part of sysctl_max_map_count-should-be-non-negative.patch · 3e26120c

由 WANG Cong 提交于 12月 17, 2009

It is a mistake that we used 'proc_dointvec', it should be
'proc_dointvec_minmax', as in the original patch.
Signed-off-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e26120c

17 12月, 2009 22 次提交

sched: Fix broken assertion · 077614ee

由 Peter Zijlstra 提交于 12月 17, 2009

There's a preemption race in the set_task_cpu() debug check in
that when we get preempted after setting task->state we'd still
be on the rq proper, but fail the test.

Check for preempted tasks, since those are always on the RQ.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20091217121830.137155561@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

077614ee

perf events: Dont report side-band events on each cpu for per-task-per-cpu events · 5d27c23d

由 Peter Zijlstra 提交于 12月 17, 2009

Acme noticed that his FORK/MMAP numbers were inflated by about
the same factor as his cpu-count.

This led to the discovery of a few more sites that need to
respect the event->cpu filter.
Reported-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20091217121830.215333434@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5d27c23d

perf events, x86/stacktrace: Make stack walking optional · 61c1917f

由 Frederic Weisbecker 提交于 12月 17, 2009

The current print_context_stack helper that does the stack
walking job is good for usual stacktraces as it walks through
all the stack and reports even addresses that look unreliable,
which is nice when we don't have frame pointers for example.

But we have users like perf that only require reliable
stacktraces, and those may want a more adapted stack walker, so
lets make this function a callback in stacktrace_ops that users
can tune for their needs.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

61c1917f

sched: Teach might_sleep() about preemptible RCU · 234da7bc

由 Frederic Weisbecker 提交于 12月 16, 2009

In practice, it is harmless to voluntarily sleep in a
rcu_read_lock() section if we are running under preempt rcu, but
it is illegal if we build a kernel running non-preemptable rcu.

Currently, might_sleep() doesn't notice sleepable operations
under rcu_read_lock() sections if we are running under
preemptable rcu because preempt_count() is left untouched after
rcu_read_lock() in this case. But we want developers who test
their changes under such config to notice the "sleeping while
atomic" issues.

So we add rcu_read_lock_nesting to prempt_count() in
might_sleep() checks.

[ v2: Handle rcu-tiny ]
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1260991265-8451-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

234da7bc

kprobe-tracer: Check new event/group name · 6f3cf440

由 Masami Hiramatsu 提交于 12月 16, 2009

Check new event/group name is same syntax as a C symbol. In other
words, checking the name is as like as other tracepoint events.

This can prevent user to create an event with useless name (e.g.
foo|bar, foo*bar).
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20091216222408.14459.68790.stgit@dhcp-100-2-132.bos.redhat.com>
[ v2: minor cleanups ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6f3cf440

sched: Make warning less noisy · 416eb395

由 Ingo Molnar 提交于 12月 17, 2009

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.807938893@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

416eb395

cpumask: avoid dereferencing struct cpumask · 62ac1279

由 Rusty Russell 提交于 12月 17, 2009

struct cpumask will be undefined soon with CONFIG_CPUMASK_OFFSTACK=y,
to avoid them being declared on the stack.

cpumask_bits() does what we want here (of course, this code is crap).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
To: Thomas Gleixner <tglx@linutronix.de>

62ac1279

cpumask: use cpu_online in kernel/perf_event.c · f6325e30

由 Rusty Russell 提交于 12月 17, 2009

Also, we want to check against nr_cpu_ids, not num_possible_cpus().
The latter works, but the correct bounds check is < nr_cpu_ids.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
To: Thomas Gleixner <tglx@linutronix.de>

f6325e30

timers: Remove duplicate setting of new_base in __mod_timer() · cf1e367e

由 Simon Horman 提交于 12月 17, 2009

new_base is set using per_cpu(tvec_bases, cpu) after selecting the
desired value of cpu immediately below so this line is a unnecessary.
Signed-off-by: NSimon Horman <horms@verge.net.au>
LKML-Reference: <20091217001542.GD25317@verge.net.au>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

cf1e367e

NOMMU: Optimise away the {dac_,}mmap_min_addr tests · 6e141546

由 David Howells 提交于 12月 15, 2009

In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
skipped by the compiler.  We do this as the minimum mmap address doesn't make
any sense in NOMMU mode.

mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6e141546

[sysctl] Fix breakage on systems with older glibc · 61cf6931

由 Andi Kleen 提交于 12月 16, 2009

As predicted during code review, the sysctl(2) changes made systems with
old glibc nearly unusable.  About every command gives a:

  warning: process `ls' used the deprecated sysctl system call with 1.4

warning in the log.

I see this on a SUSE 10.0 system with glibc 2.3.5.

Don't warn for this common case.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61cf6931

sched: Simplify set_task_cpu() · 738d2be4

由 Peter Zijlstra 提交于 12月 16, 2009

Rearrange code a bit now that its a simpler function.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.269101883@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

738d2be4

sched: Remove the cfs_rq dependency from set_task_cpu() · 88ec22d3

由 Peter Zijlstra 提交于 12月 16, 2009

In order to remove the cfs_rq dependency from set_task_cpu() we
need to ensure the task is cfs_rq invariant for all callsites.

The simple approach is to substract cfs_rq->min_vruntime from
se->vruntime on dequeue, and add cfs_rq->min_vruntime on
enqueue.

However, this has the downside of breaking FAIR_SLEEPERS since
we loose the old vruntime as we only maintain the relative
position.

To solve this, we observe that we only migrate runnable tasks,
we do this using deactivate_task(.sleep=0) and
activate_task(.wakeup=0), therefore we can restrain the
min_vruntime invariance to that state.

The only other case is wakeup balancing, since we want to
maintain the old vruntime we cannot make it relative on dequeue,
but since we don't migrate inactive tasks, we can do so right
before we activate it again.

This is where we need the new pre-wakeup hook, we need to call
this while still holding the old rq->lock. We could fold it into
->select_task_rq(), but since that has multiple callsites and
would obfuscate the locking requirements, that seems like a
fudge.

This leaves the fork() case, simply make sure that ->task_fork()
leaves the ->vruntime in a relative state.

This covers all cases where set_task_cpu() gets called, and
ensures it sees a relative vruntime.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.191697025@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88ec22d3

sched: Add pre and post wakeup hooks · efbbd05a

由 Peter Zijlstra 提交于 12月 16, 2009

As will be apparent in the next patch, we need a pre wakeup hook
for sched_fair task migration, hence rename the post wakeup hook
and one pre wakeup.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.114746117@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

efbbd05a

sched: Move kthread_bind() back to kthread.c · 881232b7

由 Peter Zijlstra 提交于 12月 16, 2009

Since kthread_bind() lost its dependencies on sched.c, move it
back where it came from.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.039524041@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

881232b7

sched: Fix select_task_rq() vs hotplug issues · 5da9a0fb

由 Peter Zijlstra 提交于 12月 16, 2009

Since select_task_rq() is now responsible for guaranteeing
->cpus_allowed and cpu_active_mask, we need to verify this.

select_task_rq_rt() can blindly return
smp_processor_id()/task_cpu() without checking the valid masks,
select_task_rq_fair() can do the same in the rare case that all
SD_flags are disabled.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.961475466@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5da9a0fb

sched: Fix sched_exec() balancing · 38022906

由 Peter Zijlstra 提交于 12月 16, 2009

Since we access ->cpus_allowed without holding rq->lock we need
a retry loop to validate the result, this comes for near free
when we merge sched_migrate_task() into sched_exec() since that
already does the needed check.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.884743662@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

38022906

sched: Ensure set_task_cpu() is never called on blocked tasks · e2912009

由 Peter Zijlstra 提交于 12月 16, 2009

In order to clean up the set_task_cpu() rq dependencies we need
to ensure it is never called on blocked tasks because such usage
does not pair with consistent rq->lock usage.

This puts the migration burden on ttwu().

Furthermore we need to close a race against changing
->cpus_allowed, since select_task_rq() runs with only preemption
disabled.

For sched_fork() this is safe because the child isn't in the
tasklist yet, for wakeup we fix this by synchronizing
set_cpus_allowed_ptr() against TASK_WAKING, which leaves
sched_exec to be a problem

This also closes a hole in (6ad4c188 sched: Fix balance vs
hotplug race) where ->select_task_rq() doesn't validate the
result against the sched_domain/root_domain.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.807938893@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e2912009

sched: Use TASK_WAKING for fork wakups · 06b83b5f

由 Peter Zijlstra 提交于 12月 16, 2009

For later convenience use TASK_WAKING for fresh tasks.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.732561278@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

06b83b5f

sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE · e4f42888

由 Peter Zijlstra 提交于 12月 16, 2009

We should skip !SD_LOAD_BALANCE domains.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.653578430@chello.nl>
CC: stable@kernel.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e4f42888

sched: Fix task_hot() test order · e6c8fba7

由 Peter Zijlstra 提交于 12月 16, 2009

Make sure not to access sched_fair fields before verifying it is
indeed a sched_fair task.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
CC: stable@kernel.org
LKML-Reference: <20091216170517.577998058@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e6c8fba7

sched: Fix set_cpu_active() in cpu_down() · 9ee349ad

由 Xiaotian Feng 提交于 12月 16, 2009

Sachin found cpu hotplug test failures on powerpc, which made
the kernel hang on his POWER box.

The problem is that we fail to re-activate a cpu when a
hot-unplug fails. Fix this by moving the de-activation into
_cpu_down after doing the initial checks.

Remove the synchronize_sched() calls and rely on those implied
by rebuilding the sched domains using the new mask.
Reported-by: NSachin Sant <sachinp@in.ibm.com>
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Tested-by: NSachin Sant <sachinp@in.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.500272612@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9ee349ad

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多