提交 · 86a3db5643c7d29bb36ca85c7a4bb67ad4d88d77 · openeuler / raspberrypi-kernel

25 1月, 2013 5 次提交

cgroup: remove duplicate RCU free on struct cgroup · 86a3db56

由 Li Zefan 提交于 1月 24, 2013

When destroying a cgroup, though in cgroup_diput() we've called
synchronize_rcu(), we then still have to free it via call_rcu().

The story is, long ago to fix a race between reading /proc/sched_debug
and freeing cgroup, the code was changed to utilize call_rcu(). See
commit a47295e6 ("cgroups: make
cgroup_path() RCU-safe")

As we've fixed cpu cgroup that cpu_cgroup_offline_css() is used
to unregister a task_group so there won't be concurrent access
to this task_group after synchronize_rcu() in diput(). Now we can
just kfree(cgrp).
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

86a3db56

sched: remove redundant NULL cgroup check in task_group_path() · 2a73991b

由 Li Zefan 提交于 1月 24, 2013

A task_group won't be online (thus no one can see it) until
cpu_cgroup_css_online(), and at that time tg->css.cgroup has
been initialized, so this NULL check is redundant.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

2a73991b

sched: split out css_online/css_offline from tg creation/destruction · ace783b9

由 Li Zefan 提交于 1月 24, 2013

This is a preparaton for later patches.

- What do we gain from cpu_cgroup_css_online():

After ss->css_alloc() and before ss->css_online(), there's a small
window that tg->css.cgroup is NULL. With this change, tg won't be seen
before ss->css_online(), where it's added to the global list, so we're
guaranteed we'll never see NULL tg->css.cgroup.

- What do we gain from cpu_cgroup_css_offline():

tg is freed via RCU, so is cgroup. Without this change, This is how
synchronization works:

cgroup_rmdir()
  no ss->css_offline()
diput()
  syncornize_rcu()
  ss->css_free()       <-- unregister tg, and free it via call_rcu()
  kfree_rcu(cgroup)    <-- wait possible refs to cgroup, and free cgroup

We can't just kfree(cgroup), because tg might access tg->css.cgroup.

With this change:

cgroup_rmdir()
  ss->css_offline()    <-- unregister tg
diput()
  synchronize_rcu()    <-- wait possible refs to tg and cgroup
  ss->css_free()       <-- free tg
  kfree_rcu(cgroup)    <-- free cgroup

As you see, kfree_rcu() is redundant now.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NIngo Molnar <mingo@kernel.org>

ace783b9

cgroup: initialize cgrp->dentry before css_alloc() · fe1c06ca

由 Li Zefan 提交于 1月 24, 2013

With this change, we're guaranteed that cgroup_path() won't see NULL
cgrp->dentry, and thus we can remove the NULL check in it.

(Well, it's not strictly true, because dummptop.dentry is always NULL
 but we already handle that separately.)
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

fe1c06ca

cgroup: remove a NULL check in cgroup_exit() · b5d646f5

由 Li Zefan 提交于 1月 24, 2013

init_task.cgroups is initialized at boot phase, and whenver a ask
is forked, it's cgroups pointer is inherited from its parent, and
it's never set to NULL afterwards.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

b5d646f5

23 1月, 2013 1 次提交

cgroup: fix bogus kernel warnings when cgroup_create() failed · 2739d3cc

由 Li Zefan 提交于 1月 21, 2013

If cgroup_create() failed and cgroup_destroy_locked() is called to
do cleanup, we'll see a bunch of warnings:

cgroup_addrm_files: failed to remove 2MB.limit_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.usage_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.max_usage_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.failcnt, err=-2
cgroup_addrm_files: failed to remove prioidx, err=-2
cgroup_addrm_files: failed to remove ifpriomap, err=-2
...

We failed to remove those files, because cgroup_create() has failed
before creating those cgroup files.

To fix this, we simply don't warn if cgroup_rm_file() can't find the
cft entry.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

2739d3cc

15 1月, 2013 2 次提交

cgroup: remove synchronize_rcu() from rebind_subsystems() · 130e3695

由 Li Zefan 提交于 1月 14, 2013

Nothing's protected by RCU in rebind_subsystems(), and I can't think
of a reason why it is needed.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

130e3695

cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() · 5d65bc0c

由 Li Zefan 提交于 1月 14, 2013

These 2 syncronize_rcu()s make attaching a task to a cgroup
quite slow, and it can't be ignored in some situations.

A real case from Colin Cross: Android uses cgroups heavily to
manage thread priorities, putting threads in a background group
with reduced cpu.shares when they are not visible to the user,
and in a foreground group when they are. Some RPCs from foreground
threads to background threads will temporarily move the background
thread into the foreground group for the duration of the RPC.
This results in many calls to cgroup_attach_task.

In cgroup_attach_task() it's task->cgroups that is protected by RCU,
and put_css_set() calls kfree_rcu() to free it.

If we remove this synchronize_rcu(), there can be threads in RCU-read
sections accessing their old cgroup via current->cgroups with
concurrent rmdir operation, but this is safe.

 # time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; }

real    0m2.524s
user    0m0.008s
sys     0m0.004s

With this patch:

real    0m0.004s
user    0m0.004s
sys     0m0.000s

tj: These synchronize_rcu()s are utterly confused.  synchornize_rcu()
    necessarily has to come between two operations to guarantee that
    the changes made by the former operation are visible to all rcu
    readers before proceeding to the latter operation.  Here,
    synchornize_rcu() are at the end of attach operations with nothing
    beyond it.  Its only effect would be delaying completion of
    write(2) to sysfs tasks/procs files until all rcu readers see the
    change, which doesn't mean anything.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NColin Cross <ccross@google.com>

5d65bc0c

11 1月, 2013 1 次提交

cgroup: use new hashtable implementation · 0ac801fe

由 Li Zefan 提交于 1月 10, 2013

Switch cgroup to use the new hashtable implementation. No functional changes.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

0ac801fe

08 1月, 2013 1 次提交

cgroup: implement cgroup_rightmost_descendant() · 12a9d2fe

由 Tejun Heo 提交于 1月 07, 2013

Implement cgroup_rightmost_descendant() which returns the right most
descendant of the specified cgroup.  This can be used to skip the
cgroup's subtree while iterating with
cgroup_for_each_descendant_pre().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NLi Zefan <lizefan@huawei.com>

12a9d2fe

26 12月, 2012 1 次提交

pidns: Stop pid allocation when init dies · c876ad76

由 Eric W. Biederman 提交于 12月 21, 2012

Oleg pointed out that in a pid namespace the sequence.
- pid 1 becomes a zombie
- setns(thepidns), fork,...
- reaping pid 1.
- The injected processes exiting.

Can lead to processes attempting access their child reaper and
instead following a stale pointer.

That waitpid for init can return before all of the processes in
the pid namespace have exited is also unfortunate.

Avoid these problems by disabling the allocation of new pids in a pid
namespace when init dies, instead of when the last process in a pid
namespace is reaped.
Pointed-out-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

c876ad76

25 12月, 2012 1 次提交

pidns: Outlaw thread creation after unshare(CLONE_NEWPID) · 8382fcac

由 Eric W. Biederman 提交于 12月 20, 2012

The sequence:
unshare(CLONE_NEWPID)
clone(CLONE_THREAD|CLONE_SIGHAND|CLONE_VM)

Creates a new process in the new pid namespace without setting
pid_ns->child_reaper.  After forking this results in a NULL
pointer dereference.

Avoid this and other nonsense scenarios that can show up after
creating a new pid namespace with unshare by adding a new
check in copy_prodcess.
Pointed-out-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

8382fcac

21 12月, 2012 2 次提交

keys: use keyring_alloc() to create module signing keyring · cfde8190

由 David Howells 提交于 12月 20, 2012

Use keyring_alloc() to create special keyrings now that it has
a permissions parameter rather than using key_alloc() +
key_instantiate_and_link().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cfde8190

kcmp: include linux/ptrace.h · 44fd07e9

由 Cyrill Gorcunov 提交于 12月 20, 2012

This makes it compile on s390. After all the ptrace_may_access
(which we use this file) is declared exactly in linux/ptrace.h.

This is preparatory work to wire this syscall up on all archs.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NAlexander Kartashov <alekskartashov@parallels.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44fd07e9

20 12月, 2012 7 次提交

sched: numa: ksm: fix oops in task_numa_placment() · 2832bc19

由 Hugh Dickins 提交于 12月 19, 2012

task_numa_placement() oopsed on NULL p->mm when task_numa_fault() got
called in the handling of break_ksm() for ksmd.  That might be a
peculiar case, which perhaps KSM could takes steps to avoid? but it's
more robust if task_numa_placement() allows for such a possibility.
Signed-off-by: NHugh Dickins <hughd@google.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2832bc19

A
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those · c40702c4
由 Al Viro 提交于 11月 20, 2012
```
note that they are relying on access_ok() already checked by caller.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c40702c4

generic compat_sys_sigaltstack() · 90268439

由 Al Viro 提交于 12月 14, 2012

Again, conditional on CONFIG_GENERIC_SIGALTSTACK
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

90268439

introduce generic sys_sigaltstack(), switch x86 and um to it · 6bf9adfc

由 Al Viro 提交于 12月 14, 2012

Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not
select it are completely unaffected
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6bf9adfc

new helper: restore_altstack() · 5c49574f

由 Al Viro 提交于 11月 18, 2012

to be used by rt_sigreturn instances
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5c49574f

Bury the conditionals from kernel_thread/kernel_execve series · ae903caa

由 Al Viro 提交于 12月 14, 2012

All architectures have
	CONFIG_GENERIC_KERNEL_THREAD
	CONFIG_GENERIC_KERNEL_EXECVE
	__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ae903caa

watchdog: Fix disable/enable regression · 3935e895

由 Bjørn Mork 提交于 12月 19, 2012

Commit 8d451690 ("watchdog: Fix CPU hotplug regression") causes an
oops or hard lockup when doing

 echo 0 > /proc/sys/kernel/nmi_watchdog
 echo 1 > /proc/sys/kernel/nmi_watchdog

and the kernel is booted with nmi_watchdog=1 (default)

Running laptop-mode-tools and disconnecting/connecting AC power will
cause this to trigger, making it a common failure scenario on laptops.

Instead of bailing out of watchdog_disable() when !watchdog_enabled we
can initialize the hrtimer regardless of watchdog_enabled status.  This
makes it safe to call watchdog_disable() in the nmi_watchdog=0 case,
without the negative effect on the enabled => disabled => enabled case.

All these tests pass with this patch:
- nmi_watchdog=1
  echo 0 > /proc/sys/kernel/nmi_watchdog
  echo 1 > /proc/sys/kernel/nmi_watchdog

- nmi_watchdog=0
  echo 0 > /sys/devices/system/cpu/cpu1/online

- nmi_watchdog=0
  echo mem > /sys/power/state

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51661

Cc: <stable@vger.kernel.org> # v3.7
Cc: Norbert Warmuth <nwarmuth@t-online.de>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3935e895

19 12月, 2012 3 次提交

fork: protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs · 2ad306b1

由 Glauber Costa 提交于 12月 18, 2012

Because those architectures will draw their stacks directly from the page
allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG
flag, and issue the corresponding free_pages.

This code path is taken when the architecture doesn't define
CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has
THREAD_SIZE >= PAGE_SIZE.  Luckily, most - if not all - of the remaining
architectures fall in this category.

This will guarantee that every stack page is accounted to the memcg the
process currently lives on, and will have the allocations to fail if they
go over limit.

For the time being, I am defining a new variant of THREADINFO_GFP, not to
mess with the other path.  Once the slab is also tracked by memcg, we can
get rid of that flag.

Tested to successfully protect against :(){ :|:& };:
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NFrederic Weisbecker <fweisbec@redhat.com>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ad306b1

res_counter: return amount of charges after res_counter_uncharge() · 50bdd430

由 Glauber Costa 提交于 12月 18, 2012

It is useful to know how many charges are still left after a call to
res_counter_uncharge.  While it is possible to issue a res_counter_read
after uncharge, this can be racy.

If we need, for instance, to take some action when the counters drop down
to 0, only one of the callers should see it.  This is the same semantics
as the atomic variables in the kernel.

Since the current return value is void, we don't need to worry about
anything breaking due to this change: nobody relied on that, and only
users appearing from now on will be checking this value.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50bdd430

irq: tsk->comm is an array · 19af395d

由 Alan Cox 提交于 12月 18, 2012

The array check is useless so remove it.

[akpm@linux-foundation.org: remove comment, per David]
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19af395d

18 12月, 2012 9 次提交

pidns: remove unused is_container_init() · a5ba911e

由 Gao feng 提交于 12月 17, 2012

Since commit 1cdcbec1 ("CRED: Neuter sys_capset()")
is_container_init() has no callers.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5ba911e

ptrace: introduce PTRACE_O_EXITKILL · 992fb6e1

由 Oleg Nesterov 提交于 12月 17, 2012

Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.

Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.

Note that the new option is not equal to the last-option << 1.  Because
currently all options have an event, and the new one starts the eventless
group.  It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.

Suggested by Amnon Shiloh.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Chris Evans <scarybeasts@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

992fb6e1

compat: generic compat_sys_sched_rr_get_interval() implementation · 0ad50c38

由 Catalin Marinas 提交于 12月 17, 2012

This function is used by sparc, powerpc tile and arm64 for compat support.
 The patch adds a generic implementation with a wrapper for PowerPC to do
the u32->int sign extension.

The reason for a single patch covering powerpc, tile, sparc and arm64 is
to keep it bisectable, otherwise kernel building may fail with mismatched
function declarations.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [for tile]
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ad50c38

trace: use kbasename() · b2e902f0

由 Andy Shevchenko 提交于 12月 17, 2012

Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2e902f0

printk: boot_delay should only affect output · 2fa72c8f

由 Andrew Cooks 提交于 12月 17, 2012

The boot_delay parameter affects all printk(), even if the log level
prevents visible output from the call.  It results in delays greater than
the user intended without purpose.

This patch changes the behaviour of boot_delay to only delay output.
Signed-off-by: NAndrew Cooks <acooks@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2fa72c8f

watchdog: store the watchdog sample period as a variable · 0f34c400

由 Chuansheng Liu 提交于 12月 17, 2012

Currently getting the sample period is always thru a complex
calculation: get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5).

We can store the sample period as a variable, and set it as __read_mostly
type.
Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f34c400

lseek: the "whence" argument is called "whence" · 965c8e59

由 Andrew Morton 提交于 12月 17, 2012

But the kernel decided to call it "origin" instead.  Fix most of the
sites.
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

965c8e59

kernel: remove reference to feature-removal-schedule.txt · 8ec7d50f

由 Tao Ma 提交于 12月 17, 2012

In commit 9c0ece06 ("Get rid of Documentation/feature-removal.txt"),
Linus removed feature-removal-schedule.txt from Documentation, but there
is still some reference to this file.  So remove them.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ec7d50f

sched: numa: Fix build error if CONFIG_NUMA_BALANCING && !CONFIG_TRANSPARENT_HUGEPAGE · 221392c3

由 Mel Gorman 提交于 12月 17, 2012

Michal Hocko reported that the following build error occurs if
CONFIG_NUMA_BALANCING is set without THP support

  kernel/sched/fair.c: In function ‘task_numa_work’:
  kernel/sched/fair.c:932:55: error: call to ‘__build_bug_failed’ declared with attribute error: BUILD_BUG failed

The problem is that HPAGE_PMD_SHIFT triggers a BUILD_BUG() on
!CONFIG_TRANSPARENT_HUGEPAGE. This patch addresses the problem.
Reported-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

221392c3

17 12月, 2012 1 次提交

random: Mix cputime from each thread that exits to the pool · 61337054

由 Nick Kossifidis 提交于 12月 16, 2012

When a thread exits mix it's cputime (userspace + kernelspace) to the entropy pool.

We don't know how "random" this is, so we use add_device_randomness that doesn't mess
with entropy count.
Signed-off-by: NNick Kossifidis <mickflemm@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

61337054

15 12月, 2012 3 次提交

E
userns: Fix typo in description of the limitation of userns_install · 5155040e
由 Eric W. Biederman 提交于 12月 09, 2012
```
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
```
5155040e

userns: Add a more complete capability subset test to commit_creds · aa6d054e

由 Eric W. Biederman 提交于 12月 14, 2012

When unsharing a user namespace we reduce our credentials to just what
can be done in that user namespace.  This is a subset of the credentials
we previously had.  Teach commit_creds to recognize this is a subset
of the credentials we have had before and don't clear the dumpability flag.

This allows an unprivileged  program to do:
unshare(CLONE_NEWUSER);
fd = open("/proc/self/uid_map", O_RDWR);

Where previously opening the uid_map writable would fail because
the the task had been made non-dumpable.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

aa6d054e

userns: Require CAP_SYS_ADMIN for most uses of setns. · 5e4a0847

由 Eric W. Biederman 提交于 12月 14, 2012

Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

5e4a0847

14 12月, 2012 3 次提交

Revert "sched: Update_cfs_shares at period edge" · 17bc14b7

由 Linus Torvalds 提交于 12月 14, 2012

This reverts commit f269ae04.

It turns out it causes a very noticeable interactivity regression with
CONFIG_SCHED_AUTOGROUP (test-case: "make -j32" of the kernel in a
terminal window, while scrolling in a browser - the autogrouping means
that the two end up in separate cgroups, and the browser should be
smooth as silk despite the high load).

Says Paul Turner:
 "It seems that the update-throttling on the wake-side is reducing the
  interactive tasks' ability to preempt.  While I suspect the right
  longer term answer here is force these updates only in the
  cross-cgroup case; this is less trivial.  For this release I believe
  the right answer is either going to be a revert or restore the updates
  on the enqueue-side."
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Bisected-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPaul Turner <pjt@google.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17bc14b7

MODSIGN: Fix kbuild output when using default extra_certificates · e10e1774

由 Michal Marek 提交于 12月 11, 2012

Reported-by: NPeter Foley <pefoley2@verizon.net>
Signed-off-by: NMichal Marek <mmarek@suse.cz>
Acked-by: NPeter Foley <pefoley2@verizon.net>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

e10e1774

MODSIGN: Avoid using .incbin in C source · 919aa45e

由 Takashi Iwai 提交于 12月 11, 2012

Using the asm .incbin statement in C sources breaks any gcc wrapper which
assumes that preprocessed C source is self-contained. Use a separate .S
file to include the siging key and certificate.

[ This means we no longer need SYMBOL_PREFIX which is defined in kernel.h
  from cbdbf2ab, so I removed it -- RR ]
Tested-by: NMichal Marek <mmarek@suse.cz>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NJames Hogan <james.hogan@imgtec.com>

919aa45e