提交 · ee74d13229fb606353ff56f4927fa93b37e95bbe · OpenHarmony / kernel_linux

25 5月, 2012 1 次提交

smpboot, idle: Optimize calls to smp_processor_id() in idle_threads_init() · ee74d132

由 Srivatsa S. Bhat 提交于 5月 24, 2012

While trying to initialize idle threads for all cpus, idle_threads_init()
calls smp_processor_id() in a loop, which is unnecessary. The intent
is to initialize idle threads for all non-boot cpus. So just use a variable
to note the boot cpu and use it in the loop.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: suresh.b.siddha@intel.com
Cc: venki@google.com
Cc: nikunj@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20120524151055.2549.64309.stgit@srivatsabhat.in.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ee74d132

23 5月, 2012 2 次提交

Revert "sched, perf: Use a single callback into the scheduler" · ab0cce56

由 Jiri Olsa 提交于 5月 23, 2012

This reverts commit cb04ff9a ("sched, perf: Use a single
callback into the scheduler").

Before this change was introduced, the process switch worked
like this (wrt. to perf event schedule):

     schedule (prev, next)
       - schedule out all perf events for prev
       - switch to next
       - schedule in all perf events for current (next)

After the commit, the process switch looks like:

     schedule (prev, next)
       - schedule out all perf events for prev
       - schedule in all perf events for (next)
       - switch to next

The problem is, that after we schedule perf events in, the pmu
is enabled and we can receive events even before we make the
switch to next - so "current" still being prev process (event
SAMPLE data are filled based on the value of the "current"
process).

Thats exactly what we see for test__PERF_RECORD test. We receive
SAMPLES with PID of the process that our tracee is scheduled
from.

Discussed with Peter Zijlstra:

 > Bah!, yeah I guess reverting is the right thing for now. Sad
 > though.
 >
 > So by having the two hooks we have a black-spot between them
 > where we receive no events at all, this black-spot covers the
 > hand-over of current and we thus don't receive the 'wrong'
 > events.
 >
 > I rather liked we could do away with both that black-spot and
 > clean up the code a little, but apparently people rely on it.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: acme@redhat.com
Cc: paulus@samba.org
Cc: cjashfor@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20120523111302.GC1638@m.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ab0cce56

Guard check in module loader against integer overflow · ef26a5a6

由 David Howells 提交于 5月 22, 2012

The check:

	if (len < hdr->e_shoff + hdr->e_shnum * sizeof(Elf_Shdr))

may not work if there's an overflow in the right-hand side of the condition.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

ef26a5a6

22 5月, 2012 2 次提交

new helper: sigsuspend() · 68f3f16d

由 Al Viro 提交于 5月 21, 2012

guts of saved_sigmask-based sigsuspend/rt_sigsuspend.  Takes
kernel sigset_t *.

Open-coded instances replaced with calling it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

68f3f16d

irq: Remove irq_chip->release() · 87568264

由 Richard Weinberger 提交于 4月 17, 2012

As it's only user (UML) does no longer need it we can get
rid of it.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

87568264

20 5月, 2012 1 次提交

userns: Silence silly gcc warning. · 4b06a81f

由 Eric W. Biederman 提交于 5月 19, 2012

On 32bit builds gcc says:
kernel/user.c:30:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
kernel/user.c:38:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]

Silence gcc by changing the constant 4294967295 to 4294967295U.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

4b06a81f

19 5月, 2012 4 次提交

tracing: Remove kernel_lock annotations · 895b67fd

由 Richard Weinberger 提交于 11月 07, 2011

The BKL is gone, these annotations are useless.

Link: http://lkml.kernel.org/r/1320654202-4433-1-git-send-email-richard@nod.atSigned-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

895b67fd

tracing: Fix initial buffer_size_kb state · a591c73f

由 Vaibhav Nagarnaik 提交于 5月 03, 2012

Make sure that the state of buffer_size_kb is initialized correctly and
returns actual size of the ring buffer.

Link: http://lkml.kernel.org/r/1336066834-1673-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a591c73f

ring-buffer: Merge separate resize loops · 05fdd70d

由 Vaibhav Nagarnaik 提交于 5月 18, 2012

There are 2 separate loops to resize cpu buffers that are online and
offline. Merge them to make the code look better.

Also change the name from update_completion to update_done to allow
shorter lines.

Link: http://lkml.kernel.org/r/1337372991-14783-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

05fdd70d

PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format · 2df83fa4

由 Minho Ban 提交于 5月 14, 2012

Sometimes resume= parameter comes in integer style (e.g. major:minor)
and then name_to_dev_t can not detect partition properly. (especially
async device like usb, mmc).

This patch calls get_gendisk() if resumewait is true and resume_file
is in integer format to work around this problem.
Signed-off-by: NMinho Ban <mhban@samsung.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

2df83fa4

18 5月, 2012 2 次提交

sched: Taint kernel with TAINT_WARN after sleep-in-atomic bug · 1c2927f1

由 Konstantin Khlebnikov 提交于 5月 10, 2012

Usually sleep-in-atomic bugs are followed by dozens other warnings.
This patch should help to figure out original source of problem.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120510122004.4873.12726.stgit@zurgSigned-off-by: NIngo Molnar <mingo@kernel.org>

1c2927f1

cred: use correct cred accessor with regards to rcu read lock · 8ca937a6

由 Sasha Levin 提交于 5月 17, 2012

Commit "userns: Convert setting and getting uid and gid system calls to use
kuid and kgid has modified the accessors in wait_task_continued() and
wait_task_stopped() to use __task_cred() instead of task_uid().

__task_cred() assumes that we're inside a rcu read lock, which is untrue
for these two functions.

Modify it to use task_uid() instead.
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

8ca937a6

17 5月, 2012 17 次提交

sched: Remove stale power aware scheduling remnants and dysfunctional knobs · 8e7fbcbc

由 Peter Zijlstra 提交于 1月 09, 2012

It's been broken forever (i.e. it's not scheduling in a power
aware fashion), as reported by Suresh and others sending
patches, and nobody cares enough to fix it properly ...
so remove it to make space free for something better.

There's various problems with the code as it stands today, first
and foremost the user interface which is bound to topology
levels and has multiple values per level. This results in a
state explosion which the administrator or distro needs to
master and almost nobody does.

Furthermore large configuration state spaces aren't good, it
means the thing doesn't just work right because it's either
under so many impossibe to meet constraints, or even if
there's an achievable state workloads have to be aware of
it precisely and can never meet it for dynamic workloads.

So pushing this kind of decision to user-space was a bad idea
even with a single knob - it's exponentially worse with knobs
on every node of the topology.

There is a proposal to replace the user interface with a single
3 state knob:

 sched_balance_policy := { performance, power, auto }

where 'auto' would be the preferred default which looks at things
like Battery/AC mode and possible cpufreq state or whatever the hw
exposes to show us power use expectations - but there's been no
progress on it in the past many months.

Aside from that, the actual implementation of the various knobs
is known to be broken. There have been sporadic attempts at
fixing things but these always stop short of reaching a mergable
state.

Therefore this wholesale removal with the hopes of spurring
people who care to come forward once again and work on a
coherent replacement.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

8e7fbcbc

ftrace: Remove selecting FRAME_POINTER with FUNCTION_TRACER · b732d439

由 Steven Rostedt 提交于 4月 30, 2012

The function tracer will enable the -pg option with gcc, which requires
that frame pointers. When FRAME_POINTER is defined in the kernel config
it adds the gcc option -fno-omit-frame-pointer which causes some problems
on some architectures. For those architectures, the FRAME_POINTER select
was not set.

When FUNCTION_TRACER was selected on these architectures that can not have
-fno-omit-frame-pointer, the -pg option is still set. But when
FRAME_POINTER is not selected, the kernel config would add the gcc option
-fomit-frame-pointer. Adding this option is incompatible with -pg
even on archs that do not need frame pointers with -pg.

The answer to this was to just not add either -fno-omit-frame-pointer
or -fomit-frame-pointer on these archs that want function tracing
but do not set FRAME_POINTER.

As it turns out, for archs that require frame pointers for function
tracing, the same can be used. If gcc requires frame pointers with
-pg, it will simply add it. The best thing to do is not select FRAME_POINTER
when function tracing is selected, and let gcc add it if needed.

Only add the -fno-omit-frame-pointer when something else selects
FRAME_POINTER, but do not add -fomit-frame-pointer if function tracing
is selected.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b732d439

ftrace/x86: Have x86 ftrace use the ftrace_modify_all_code() · e4f5d544

由 Steven Rostedt 提交于 4月 27, 2012

To remove duplicate code, have the ftrace arch_ftrace_update_code()
use the generic ftrace_modify_all_code(). This requires that the
default ftrace_replace_code() becomes a weak function so that an
arch may override it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e4f5d544

ftrace: Make ftrace_modify_all_code() global for archs to use · 8ed3e2cf

由 Steven Rostedt 提交于 4月 26, 2012

Rename __ftrace_modify_code() to ftrace_modify_all_code() and make
it global for all archs to use. This will remove the duplication
of code, as archs that can modify code without stop_machine()
can use it directly outside of the stop_machine() call.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8ed3e2cf

ftrace: Return record ip addr for ftrace_location() · f0cf973a

由 Steven Rostedt 提交于 4月 25, 2012

ftrace_location() is passed an addr, and returns 1 if the addr is
on a ftrace nop (or caller to ftrace_caller), and 0 otherwise.

To let kprobes know if it should move a breakpoint or not, it
must return the actual addr that is the start of the ftrace nop.
This way a kprobe placed on the location of a ftrace nop, can
instead be placed on the instruction after the nop. Even if the
probe addr is on the second or later byte of the nop, it can
simply be moved forward.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f0cf973a

ftrace: Consolidate ftrace_location() and ftrace_text_reserved() · a650e02a

由 Steven Rostedt 提交于 4月 25, 2012

Both ftrace_location() and ftrace_text_reserved() do basically the same thing.
They search to see if an address is in the ftace table (contains an address
that may change from nop to call ftrace_caller). The difference is
that ftrace_location() searches a single address, but ftrace_text_reserved()
searches a range.

This also makes the ftrace_text_reserved() faster as it now uses a bsearch()
instead of linearly searching all the addresses within a page.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a650e02a

ftrace: Speed up search by skipping pages by address · 9644302e

由 Steven Rostedt 提交于 4月 25, 2012

As all records in a page of the ftrace table are sorted, we can
speed up the search algorithm by checking if the address to look for
falls in between the first and last record ip on the page.

This speeds up both the ftrace_location() and ftrace_text_reserved()
algorithms, as it can skip full pages when the search address is
not in them.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9644302e

ftrace: Remove extra helper functions · 706c81f8

由 Steven Rostedt 提交于 4月 24, 2012

The ftrace_record_ip() and ftrace_alloc_dyn_node() were from the
time of the ftrace daemon. Although they were still used, they
still make things a bit more complex than necessary.

Move the code into the one function that uses it, and remove the
helper functions.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

706c81f8

ftrace: Sort all function addresses, not just per page · 9fd49328

由 Steven Rostedt 提交于 4月 24, 2012

Instead of just sorting the ip's of the functions per ftrace page,
sort the entire list before adding them to the ftrace pages.

This will allow the bsearch algorithm to be sped up as it can
also sort by pages, not just records within a page.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9fd49328

tracing: change CPU ring buffer state from tracing_cpumask · 71babb27

由 Vaibhav Nagarnaik 提交于 5月 03, 2012

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

71babb27

tracing: Check return value of tracing_dentry_percpu() · 0a3d7ce7

由 Namhyung Kim 提交于 4月 23, 2012

If tracing_dentry_percpu() failed, tracing_init_debugfs_percpu()
will try to create each cpu directories on debugfs' root directory
as d_percpu is NULL.

Link: http://lkml.kernel.org/r/1335143517-2285-1-git-send-email-namhyung.kim@lge.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0a3d7ce7

ring-buffer: Reset head page before running self test · 308f7eeb

由 Steven Rostedt 提交于 5月 16, 2012

When the ring buffer does its consistency test on itself, it
removes the head page, runs the tests, and then adds it back
to what the "head_page" pointer was. But because the head_page
pointer may lack behind the real head page (held by the link
list pointer). The reset may be incorrect.

Instead, if the head_page exists (it does not on first allocation)
reset it back to the real head page before running the consistency
tests. Then it will be put back to its original location after
the tests are complete.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

308f7eeb

ring-buffer: Add integrity check at end of iter read · 659f451f

由 Steven Rostedt 提交于 5月 14, 2012

There use to be ring buffer integrity checks after updating the
size of the ring buffer. But now that the ring buffer can modify
the size while the system is running, the integrity checks were
removed, as they require the ring buffer to be disabed to perform
the check.

Move the integrity check to the reading of the ring buffer via the
iterator reads (the "trace" file). As reading via an iterator requires
disabling the ring buffer, it is a perfect place to have it.

If the ring buffer happens to be disabled when updating the size,
we still perform the integrity check.

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

659f451f

fork: move the real prepare_to_copy() users to arch_dup_task_struct() · 55ccf3fe

由 Suresh Siddha 提交于 5月 16, 2012

Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.comAcked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

55ccf3fe

ring-buffer: Make addition of pages in ring buffer atomic · 5040b4b7

由 Vaibhav Nagarnaik 提交于 5月 03, 2012

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Link: http://lkml.kernel.org/r/1336096792-25373-2-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5040b4b7

ring-buffer: Make removal of ring buffer pages atomic · 83f40318

由 Vaibhav Nagarnaik 提交于 5月 03, 2012

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

83f40318

tracing: Clean up tracing_mark_write() · 6edb2a8a

由 Steven Rostedt 提交于 5月 11, 2012

On gcc 4.5 the function tracing_mark_write() would give a warning
of page2 being uninitialized. This is due to a bug in gcc because
the logic prevents page2 from being used uninitialized, and
gcc 4.6+ does not complain (correctly).

Instead of adding a "unitialized" around page2, which could show
a bug later on, I combined page1 and page2 into an array map_pages[].
This binds the two and the two are modified according to nr_pages
(what gcc 4.5 seems to ignore). This no longer gives a warning with
gcc 4.5 nor with gcc 4.6.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6edb2a8a

16 5月, 2012 3 次提交

E
userns: Convert cgroup permission checks to use uid_eq · 14a590c3
由 Eric W. Biederman 提交于 3月 12, 2012
```
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
```
14a590c3

userns: signal remove unnecessary map_cred_ns · 54ba47ed

由 Eric W. Biederman 提交于 3月 13, 2012

map_cred_ns is a light wrapper around from_kuid with the order of the arguments
reversed. Replace map_cred_ns with from_kuid and remove map_cred_ns.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

54ba47ed

E
userns: Teach inode_capable to understand inodes whose uids map to other namespaces. · 65cc5a17
由 Eric W. Biederman 提交于 3月 12, 2012
```
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
```
65cc5a17

15 5月, 2012 4 次提交

genirq: export handle_edge_irq() and irq_to_desc() · 3911ff30

由 Jiri Kosina 提交于 5月 13, 2012

Export handle_edge_irq() and irq_to_desc() to modules to allow them to
do things such as

	__irq_set_handler_locked(...., handle_edge_irq);

This fixes

	ERROR: "handle_edge_irq" [drivers/gpio/gpio-pch.ko] undefined!
	ERROR: "irq_to_desc" [drivers/gpio/gpio-pch.ko] undefined!

when gpio-pch is being built as a module.

This was introduced by commit df9541a6 ("gpio: pch9: Use proper flow
type handlers") that added

	__irq_set_handler_locked(d->irq, handle_edge_irq);

but handle_edge_irq() was not exported for modules (and inlined
__irq_set_handler_locked() requires irq_to_desc() exported as well)
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3911ff30

lockdep: fix oops in processing workqueue · 4d82a1de

由 Peter Zijlstra 提交于 5月 15, 2012

Under memory load, on x86_64, with lockdep enabled, the workqueue's
process_one_work() has been seen to oops in __lock_acquire(), barfing
on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].

Because it's permissible to free a work_struct from its callout function,
the map used is an onstack copy of the map given in the work_struct: and
that copy is made without any locking.

Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
"rep movsq" for that structure copy: which might race with a workqueue
user's wait_on_work() doing lock_map_acquire() on the source of the
copy, putting a pointer into the class_cache[], but only in time for
the top half of that pointer to be copied to the destination map.

Boom when process_one_work() subsequently does lock_map_acquire()
on its onstack copy of the lockdep_map.

Fix this, and a similar instance in call_timer_fn(), with a
lockdep_copy_map() function which additionally NULLs the class_cache[].

Note: this oops was actually seen on 3.4-next, where flush_work() newly
does the racing lock_map_acquire(); but Tejun points out that 3.4 and
earlier are already vulnerable to the same through wait_on_work().

* Patch orginally from Peter.  Hugh modified it a bit and wrote the
  description.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Reported-by: NHugh Dickins <hughd@google.com>
LKML-Reference: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
Signed-off-by: NTejun Heo <tj@kernel.org>

4d82a1de

workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active · 544ecf31

由 Tejun Heo 提交于 5月 14, 2012

worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
isn't zero when every worker is idle.  This can trigger spuriously
while a cpu is going down due to the way trustee sets %WORKER_ROGUE
and zaps nr_running.

It first sets %WORKER_ROGUE on all workers without updating
nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
then zaps nr_running.  If the last running worker enters idle
inbetween, it would see stale nr_running which hasn't been zapped yet
and trigger the WARN_ON_ONCE().

Fix it by performing the sanity check iff the trustee is idle.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org

544ecf31

printk() - isolate KERN_CONT users from ordinary complete lines · c313af14

由 Kay Sievers 提交于 5月 14, 2012

Arrange the continuation printk() buffering to be fully separated from the
ordinary full line users.

Limit the exposure to races and wrong printk() line merges to users of
continuation only. Ordinary full line users racing against continuation
users will no longer affect each other.

Multiple continuation users from different threads, racing against each
other will not wrongly be merged into a single line, but printed as
separate lines.

Test output of a kernel module which starts two separate threads which
race against each other, one of them printing a single full terminated
line:
  printk("(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)\n");

The other one printing the line, every character separate in a
continuation loop:
  printk("(C");
  for (i = 0; i < 58; i++)
          printk(KERN_CONT "C");
  printk(KERN_CONT "C)\n");

Behavior of single and non-thread-aware printk() buffer:
  # modprobe printk-race
  printk test init
  (CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
  (CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)

New behavior with separate and thread-aware continuation buffer:
  # modprobe printk-race
  printk test init
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
  (CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
  (CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Ingo Molnar  <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: NKay Sievers <kay@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c313af14

14 5月, 2012 4 次提交

printk() - restore prefix/timestamp printing for multi-newline strings · 3ce9a7c0

由 Kay Sievers 提交于 5月 13, 2012

Calls like:
  printk("\n *** DEADLOCK ***\n\n");
will print 3 properly indented, separated, syslog + timestamp prefixed lines in
the log output.
Reported-By: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NKay Sievers <kay@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3ce9a7c0

sched/debug: Fix printing large integers on 32-bit platforms · 13e099d2

由 Peter Zijlstra 提交于 5月 14, 2012

Some numbers like nr_running and nr_uninterruptible are fundamentally
unsigned since its impossible to have a negative amount of tasks, yet
we still print them as signed to easily recognise the underflow
condition.

rq->nr_uninterruptible has 'special' accounting and can in fact very
easily become negative on a per-cpu basis.

It was noted that since the P() macro assumes things are long long and
the promotion of unsigned 'int/long' to long long on 32bit doesn't
sign extend we print silly large numbers instead of the easier to read
signed numbers.

Therefore extend the P() macro to not require the sign extention.
Reported-by: NDiwakar Tundlam <dtundlam@nvidia.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-gk5tm8t2n4ix2vkpns42uqqp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

13e099d2

sched/fair: Improve the ->group_imb logic · e44bc5c5

由 Peter Zijlstra 提交于 5月 11, 2012

Group imbalance is meant to deal with situations where affinity masks
and sched domains don't align well, such as 3 cpus from one group and
6 from another. In this case the domain based balancer will want to
put an equal amount of tasks on each side even though they don't have
equal cpus.

Currently group_imb is set whenever two cpus of a group have a weight
difference of at least one avg task and the heaviest cpu has at least
two tasks. A group with imbalance set will always be picked as busiest
and a balance pass will be forced.

The problem is that even if there are no affinity masks this stuff can
trigger and cause weird balancing decisions, eg. the observed
behaviour was that of 6 cpus, 5 had 2 and 1 had 3 tasks, due to the
difference of 1 avg load (they all had the same weight) and nr_running
being >1 the group_imbalance logic triggered and did the weird thing
of pulling more load instead of trying to move the 1 excess task to
the other domain of 6 cpus that had 5 cpu with 2 tasks and 1 cpu with
1 task.

Curb the group_imbalance stuff by making the nr_running condition
weaker by also tracking the min_nr_running and using the difference in
nr_running over the set instead of the absolute max nr_running.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-9s7dedozxo8kjsb9kqlrukkf@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e44bc5c5

sched/nohz: Fix rq->cpu_load[] calculations · 556061b0

由 Peter Zijlstra 提交于 5月 11, 2012

While investigating why the load-balancer did funny I found that the
rq->cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

556061b0

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多