提交 · e839ca528718e68cad32a307dc9aabf01ef3eb05 · OpenHarmony / kernel_linux

14 2月, 2012 1 次提交

由 Al Viro 提交于 2月 13, 2012

Trim security.h
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJames Morris <jmorris@namei.org>

40401530

05 12月, 2011 1 次提交

x86: Panic on detection of stack overflow · 55af7796

由 Mitsuo Hayasaka 提交于 11月 29, 2011

Currently, messages are just output on the detection of stack
overflow, which is not sufficient for systems that need a
high reliability. This is because in general the overflow may
corrupt data, and the additional corruption may occur due to
reading them unless systems stop.

This patch adds the sysctl parameter
kernel.panic_on_stackoverflow and causes a panic when detecting
the overflows of kernel, IRQ and exception stacks except user
stack according to the parameter. It is disabled by default.
Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>

55af7796

01 11月, 2011 1 次提交

kernel/sysctl.c: add cap_last_cap to /proc/sys/kernel · 73efc039

由 Dan Ballard 提交于 10月 31, 2011

Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from the header files
only.  The fact that this value cannot be determined properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels.  They assume capabilities are available
which actually aren't.  libcap-ng is one example.  And we ran into the
same problem with systemd too.

Now the capability is exported in /proc/sys/kernel/cap_last_cap.

[akpm@linux-foundation.org: make cap_last_cap const, per Ulrich]
Signed-off-by: NDan Ballard <dan@mindstab.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ulrich Drepper <drepper@akkadia.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73efc039

30 10月, 2011 1 次提交

[S390] sparse: fix sparse warnings about missing prototypes · 638ad34a

由 Martin Schwidefsky 提交于 10月 30, 2011

Add prototypes and includes for functions used in different modules.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

638ad34a

14 8月, 2011 1 次提交

sched: Accumulate per-cfs_rq cpu usage and charge against bandwidth · ec12cb7f

由 Paul Turner 提交于 7月 21, 2011

Account bandwidth usage on the cfs_rq level versus the task_groups to which
they belong. Whether we are tracking bandwidth on a given cfs_rq is maintained
under cfs_rq->runtime_enabled.

cfs_rq's which belong to a bandwidth constrained task_group have their runtime
accounted via the update_curr() path, which withdraws bandwidth from the global
pool as desired. Updates involving the global pool are currently protected
under cfs_bandwidth->lock, local runtime is protected by rq->lock.

This patch only assigns and tracks quota, no action is taken in the case that
cfs_rq->runtime_used exceeds cfs_rq->runtime_assigned.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NNikhil Rao <ncrao@google.com>
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110721184757.179386821@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

ec12cb7f

21 7月, 2011 1 次提交

sysctl,rcu: Convert call_rcu(free_head) to kfree · a95cded3

由 Paul E. McKenney 提交于 5月 01, 2011

The RCU callback free_head just calls kfree(), so we can use kfree_rcu()
instead of call_rcu().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

a95cded3

04 6月, 2011 1 次提交

perf: Comment /proc/sys/kernel/perf_event_paranoid to be part of user ABI · aa4a2218

由 Vince Weaver 提交于 6月 03, 2011

Turns out that distro packages use this file as an indicator of
the perf event subsystem - this is easier to check for from scripts
than the existence of the system call.

This is easy enough to keep around for the kernel, so add a
comment to make sure it stays so.
Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031751170.29381@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>

aa4a2218

23 5月, 2011 1 次提交

watchdog: Disable watchdog when thresh is zero · 586692a5

由 Mandeep Singh Baines 提交于 5月 22, 2011

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

In addition, the logic of proc_dowatchdog_thresh and
proc_dowatchdog_enabled has been factored into proc_dowatchdog.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-3-git-send-email-msb@chromium.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071018.GE22305@elte.hu>

586692a5

20 5月, 2011 1 次提交

arch/tile: support signal "exception-trace" hook · 571d76ac

由 Chris Metcalf 提交于 5月 16, 2011

This change adds support for /proc/sys/debug/exception-trace to tile.
Like x86 and sparc, by default it is set to "1", generating a one-line
printk whenever a user process crashes.  By setting it to "2", we get
a much more complete userspace diagnostic at crash time, including
a user-space backtrace, register dump, and memory dump around the
address of the crash.

Some vestiges of the Tilera-internal version of this support are
removed with this patch (the show_crashinfo variable and the
arch_coredump_signal function).  We retain a "crashinfo" boot parameter
which allows you to set the boot-time value of exception-trace.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>

571d76ac

04 4月, 2011 1 次提交

capabilites: allow the application of capability limits to usermode helpers · 17f60a7d

由 Eric Paris 提交于 4月 01, 2011

There is no way to limit the capabilities of usermodehelpers. This problem
reared its head recently when someone complained that any user with
cap_net_admin was able to load arbitrary kernel modules, even though the user
didn't have cap_sys_module.  The reason is because the actual load is done by
a usermode helper and those always have the full cap set.  This patch addes new
sysctls which allow us to bound the permissions of usermode helpers.

/proc/sys/kernel/usermodehelper/bset
/proc/sys/kernel/usermodehelper/inheritable

You must have CAP_SYS_MODULE  and CAP_SETPCAP to change these (changes are
&= ONLY).  When the kernel launches a usermodehelper it will do so with these
as the bset and pI.

-v2:	make globals static
	create spinlock to protect globals

-v3:	require both CAP_SETPCAP and CAP_SYS_MODULE
-v4:	fix the typo s/CAP_SET_PCAP/CAP_SETPCAP/ because I didn't commit
Signed-off-by: NEric Paris <eparis@redhat.com>
No-objection-from: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: NAndrew G. Morgan <morgan@kernel.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

17f60a7d

24 3月, 2011 2 次提交

sysctl: restrict write access to dmesg_restrict · bfdc0b49

由 Richard Weinberger 提交于 3月 23, 2011

When dmesg_restrict is set to 1 CAP_SYS_ADMIN is needed to read the kernel
ring buffer.  But a root user without CAP_SYS_ADMIN is able to reset
dmesg_restrict to 0.

This is an issue when e.g.  LXC (Linux Containers) are used and complete
user space is running without CAP_SYS_ADMIN.  A unprivileged and jailed
root user can bypass the dmesg_restrict protection.

With this patch writing to dmesg_restrict is only allowed when root has
CAP_SYS_ADMIN.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Acked-by: NDan Rosenberg <drosenberg@vsecurity.com>
Acked-by: NSerge E. Hallyn <serge@hallyn.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bfdc0b49

sysctl: add some missing input constraint checks · cb16e95f

由 Petr Holasek 提交于 3月 23, 2011

Add boundaries of allowed input ranges for: dirty_expire_centisecs,
drop_caches, overcommit_memory, page-cluster and panic_on_oom.
Signed-off-by: NPetr Holasek <pholasek@redhat.com>
Acked-by: NDave Young <hidave.darkstar@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb16e95f

08 3月, 2011 1 次提交

unfuck proc_sysctl ->d_compare() · dfef6dcd

由 Al Viro 提交于 3月 08, 2011

a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dfef6dcd

23 2月, 2011 1 次提交

sched, autogroup, sysctl: Use proc_dointvec_minmax() instead · 1747b21f

由 Yong Zhang 提交于 2月 20, 2011

sched_autogroup_enabled has min/max value, proc_dointvec_minmax() is
be used for this case.
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1298185696-4403-2-git-send-email-yong.zhang0@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1747b21f

16 2月, 2011 1 次提交

perf: Optimize throttling code · 163ec435

由 Peter Zijlstra 提交于 2月 16, 2011

By pre-computing the maximum number of samples per tick we can avoid a
multiplication and a conditional since MAX_INTERRUPTS >
max_samples_per_tick.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

163ec435

03 2月, 2011 1 次提交

sched: Use a buddy to implement yield_task_fair() · ac53db59

由 Rik van Riel 提交于 2月 01, 2011

Use the buddy mechanism to implement yield_task_fair.  This
allows us to skip onto the next highest priority se at every
level in the CFS tree, unless doing so would introduce gross
unfairness in CPU time distribution.

We order the buddy selection in pick_next_entity to check
yield first, then last, then next.  We need next to be able
to override yield, because it is possible for the "next" and
"yield" task to be different processen in the same sub-tree
of the CFS tree.  When they are, we need to go into that
sub-tree regardless of the "yield" hint, and pick the correct
entity once we get to the right level.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ac53db59

02 2月, 2011 1 次提交

security: remove unused security_sysctl hook · 4916ca40

由 Lucian Adrian Grijincu 提交于 2月 01, 2011

The only user for this hook was selinux. sysctl routes every call
through /proc/sys/. Selinux and other security modules use the file
system checks for sysctl too, so no need for this hook any more.
Signed-off-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

4916ca40

25 1月, 2011 1 次提交

Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent · 8c6a98b2

由 Andy Whitcroft 提交于 1月 24, 2011

Currently sysrq_enabled and __sysrq_enabled are initialised separately
and inconsistently, leading to sysrq being actually enabled by reported
as not enabled in sysfs.  The first change to the sysfs configurable
synchronises these two:

    static int __read_mostly sysrq_enabled = 1;
    static int __sysrq_enabled;

Add a common define to carry the default for these preventing them becoming
out of sync again.  Default this to 1 to mirror previous behaviour.
Signed-off-by: NAndy Whitcroft <apw@canonical.com>
Cc: stable@kernel.org
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

8c6a98b2

14 1月, 2011 3 次提交

sysctl: remove obsolete comments · e020e742

由 Jovi Zhang 提交于 1月 12, 2011

ctl_unnumbered.txt have been removed in Documentation directory so just
also remove this invalid comments

[akpm@linux-foundation.org: fix Documentation/sysctl/00-INDEX, per Dave]
Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e020e742

sysctl: fix #ifdef guard comment · 55610500

由 Jovi Zhang 提交于 1月 12, 2011

Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

55610500

kptr_restrict for hiding kernel pointers from unprivileged users · 455cd5ab

由 Dan Rosenberg 提交于 1月 12, 2011

Add the %pK printk format specifier and the /proc/sys/kernel/kptr_restrict
sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

[akpm@linux-foundation.org: check for IRQ context when !kptr_restrict, save an indent level, s/WARN/WARN_ONCE/]
[akpm@linux-foundation.org: coding-style fixup]
[randy.dunlap@oracle.com: fix kernel/sysctl.c warning]
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

455cd5ab

10 12月, 2010 1 次提交

x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls · 5dc30558

由 Don Zickus 提交于 11月 29, 2010

Originally adapted from Huang Ying's patch which moved the
unknown_nmi_panic to the traps.c file.  Because the old nmi
watchdog was deleted before this change happened, the
unknown_nmi_panic sysctl was lost.  This re-adds it.

Also, the nmi_watchdog sysctl was re-implemented and its
documentation updated accordingly.
Patch-inspired-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5dc30558

30 11月, 2010 1 次提交

sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4

由 Mike Galbraith 提交于 11月 30, 2010

A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity.  This patch
implements an idea from Linus, to automatically create task
groups.  Currently, only per session autogroups are implemented,
but the patch leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group.  When a task calls setsid(), a new task group
is created, the process is moved into the new task group, and a
reference to the preveious task group is dropped.  Child
processes inherit this task group thereafter, and increase it's
refcount.  When the last thread of a process exits, the
process's reference is dropped, such that when the last process
referencing an autogroup exits, the autogroup is destroyed.

At runqueue selection time, IFF a task has no cgroup assignment,
its current autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem:

  cat /proc/<pid>/autogroup

Displays the task's group and the group's nice level.

  echo <nice level> > /proc/<pid>/autogroup

Sets the task group's shares to the weight of nice <level> task.
Setting nice level is rate limited for !admin users due to the
abuse risk of task group locking.

The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
the boot option noautogroup, and can also be turned on/off on
the fly via:

  echo [01] > /proc/sys/kernel/sched_autogroup_enabled

... which will automatically move tasks to/from the root task group.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
[ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5091faa4

18 11月, 2010 3 次提交

sched: Add sysctl_sched_shares_window · a7a4f8a7

由 Paul Turner 提交于 11月 15, 2010

Introduce a new sysctl for the shares window and disambiguate it from
sched_time_avg.

A 10ms window appears to be a good compromise between accuracy and performance.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234938.112173964@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a7a4f8a7

sched: Rewrite tg_shares_up) · 2069dd75

由 Peter Zijlstra 提交于 11月 15, 2010

By tracking a per-cpu load-avg for each cfs_rq and folding it into a
global task_group load on each tick we can rework tg_shares_up to be
strictly per-cpu.

This should improve cpu-cgroup performance for smp systems
significantly.

[ Paul: changed to use queueing cfs_rq + bug fixes ]
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101115234937.580480400@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2069dd75

x86, nmi_watchdog: Remove the old nmi_watchdog · 5f2b0ba4

由 Don Zickus 提交于 11月 12, 2010

Now that we have a new nmi_watchdog that is more generic and
sits on top of the perf subsystem, we really do not need the old
nmi_watchdog any more.

In addition, the old nmi_watchdog doesn't really work if you are
using the default clocksource, hpet.  The old nmi_watchdog code
relied on local apic interrupts to determine if the cpu is still
alive.  With hpet as the clocksource, these interrupts don't
increment any more and the old nmi_watchdog triggers false
postives.

This piece removes the old nmi_watchdog code and stubs out any
variables and functions calls.  The stubs are the same ones used
by the new nmi_watchdog code, so it should be well tested.
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5f2b0ba4

16 11月, 2010 1 次提交

kernel/sysctl.c: Fix build failure with !CONFIG_PRINTK · df6e61d4

由 Joe Perches 提交于 11月 15, 2010

Sigh...
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df6e61d4

12 11月, 2010 1 次提交

Restrict unprivileged access to kernel syslog · eaf06b24

由 Dan Rosenberg 提交于 11月 11, 2010

The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses.  Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.

This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl.  When set to "0", the default, no restrictions are
enforced.  When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.

[akpm@linux-foundation.org: explain the config option in kernel.txt]
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NEugene Teo <eugeneteo@kernel.org>
Acked-by: NKees Cook <kees.cook@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eaf06b24

27 10月, 2010 2 次提交

printk: declare printk_ratelimit_state in ratelimit.h · f5d87d85

由 Namhyung Kim 提交于 10月 26, 2010

Adding declaration of printk_ratelimit_state in ratelimit.h removes
potential build breakage and following sparse warning:

 kernel/printk.c:1426:1: warning: symbol 'printk_ratelimit_state' was not declared. Should it be static?

[akpm@linux-foundation.org: remove unneeded ifdef]
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5d87d85

fs: allow for more than 2^31 files · 518de9b3

由 Eric Dumazet 提交于 10月 26, 2010

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

<quote>

We were seeing a failure which prevented boot.  The kernel was incapable
of creating either a named pipe or unix domain socket.  This comes down
to a common kernel function called unix_create1() which does:

        atomic_inc(&unix_nr_socks);
        if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

        n = (mempages * (PAGE_SIZE / 1024)) / 10;
        files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000).  That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

</quote>

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of atomic_t.

get_max_files() is changed to return an unsigned long.  get_nr_files() is
changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704     0       2147483648
Reported-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid Miller <davem@davemloft.net>
Reviewed-by: NRobin Holt <holt@sgi.com>
Tested-by: NRobin Holt <holt@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

518de9b3

26 10月, 2010 3 次提交

fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8

由 Christoph Hellwig 提交于 10月 10, 2010

The nr_dentry stat is a globally touched cacheline and atomic operation
twice over the lifetime of a dentry. It is used for the benfit of userspace
only. Turn it into a per-cpu counter and always decrement it in d_free instead
of doing various batching operations to reduce lock hold times in the callers.

Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

312d3ca8

fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa

由 Dave Chinner 提交于 10月 23, 2010

The number of inodes allocated does not need to be tied to the
addition or removal of an inode to/from a list. If we are not tied
to a list lock, we could update the counters when inodes are
initialised or destroyed, but to do that we need to convert the
counters to be per-cpu (i.e. independent of a lock). This means that
we have the freedom to change the list/locking implementation
without needing to care about the counters.

Based on a patch originally from Eric Dumazet.

[AV: cleaned up a bit, fixed build breakage on weird configs
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cffbc8aa

fs: allow for more than 2^31 files · 7e360c38

由 Eric Dumazet 提交于 10月 05, 2010

Andrew,

Could you please review this patch, you probably are the right guy to
take it, because it crosses fs and net trees.

Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
depend on previous patch (sysctl: fix min/max handling in
__do_proc_doulongvec_minmax())

Thanks !

[PATCH V4] fs: allow for more than 2^31 files

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

<quote>

We were seeing a failure which prevented boot.  The kernel was incapable
of creating either a named pipe or unix domain socket.  This comes down
to a common kernel function called unix_create1() which does:

        atomic_inc(&unix_nr_socks);
        if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

        n = (mempages * (PAGE_SIZE / 1024)) / 10;
        files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000).  That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

</quote>

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of
atomic_t.

get_max_files() is changed to return an unsigned long.
get_nr_files() is changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704     0       2147483648
Reported-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid Miller <davem@davemloft.net>
Reviewed-by: NRobin Holt <holt@sgi.com>
Tested-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e360c38

08 10月, 2010 1 次提交

sysctl: fix min/max handling in __do_proc_doulongvec_minmax() · 27b3d80a

由 Eric Dumazet 提交于 10月 07, 2010

When proc_doulongvec_minmax() is used with an array of longs, and no
min/max check requested (.extra1 or .extra2 being NULL), we dereference a
NULL pointer for the second element of the array.

Noticed while doing some changes in network stack for the "16TB problem"

Fix is to not change min & max pointers in __do_proc_doulongvec_minmax(),
so that all elements of the vector share an unique min/max limit, like
proc_dointvec_minmax().

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27b3d80a

05 9月, 2010 1 次提交

gcc-4.6: kernel/*: Fix unused but set warnings · b3bd3de6

由 Andi Kleen 提交于 8月 10, 2010

No real bugs I believe, just some dead code.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: andi@firstfloor.org
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b3bd3de6

10 8月, 2010 1 次提交

oom: move sysctl declarations to oom.h · 8e4228e1

由 David Rientjes 提交于 8月 09, 2010

The three oom killer sysctl variables (sysctl_oom_dump_tasks,
sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better
declared in include/linux/oom.h rather than kernel/sysctl.c.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e4228e1

06 8月, 2010 1 次提交

hotplug: Support kernel/hotplug sysctl variable when !CONFIG_NET · 94f17cd7

由 Ian Abbott 提交于 6月 07, 2010

The kernel/hotplug sysctl variable (/proc/sys/kernel/hotplug file) was
made conditional on CONFIG_NET by commit
f743ca5e (applied in 2.6.18) to fix
problems with undefined references in 2.6.16 when CONFIG_HOTPLUG=y &&
!CONFIG_NET, but this restriction is no longer needed.

This patch makes the kernel/hotplug sysctl variable depend only on
CONFIG_HOTPLUG.
Signed-off-by: NIan Abbott <abbotti@mev.co.uk>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.COM>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

94f17cd7

28 7月, 2010 2 次提交

sysctl extern cleanup: inotify · d14f1729

由 Dave Young 提交于 2月 25, 2010

Extern declarations in sysctl.c should be move to their own head file, and
then include them in relavant .c files.

Move inotify_table extern declaration to linux/inotify.h
Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NEric Paris <eparis@redhat.com>

d14f1729

dnotify: move dir_notify_enable declaration · 6e006701

由 Alexey Dobriyan 提交于 1月 20, 2010

Move dir_notify_enable declaration to where it belongs -- dnotify.h .
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

6e006701

23 7月, 2010 1 次提交

slow-work: kill it · 181a51f6

由 Tejun Heo 提交于 7月 20, 2010

slow-work doesn't have any user left.  Kill it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NDavid Howells <dhowells@redhat.com>

181a51f6

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多