提交 · 33faba7fa7f2288d2f8aaea95958b2c97bf9ebfb · openanolis / cloud-kernel

14 1月, 2014 4 次提交

audit: listen in all network namespaces · 33faba7f

由 Richard Guy Briggs 提交于 7月 16, 2013

Convert audit from only listening in init_net to use register_pernet_subsys()
to dynamically manage the netlink socket list.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

33faba7f

audit: restore order of tty and ses fields in log output · 2f2ad101

由 Richard Guy Briggs 提交于 7月 15, 2013

When being refactored from audit_log_start() to audit_log_task_info(), in
commit e23eb920 the tty and ses fields in the log output got transposed.
Restore to original order to avoid breaking search tools.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

2f2ad101

audit: fix netlink portid naming and types · f9441639

由 Richard Guy Briggs 提交于 8月 14, 2013

Normally, netlink ports use the PID of the userspace process as the port ID.
If the PID is already in use by a port, the kernel will allocate another port
ID to avoid conflict.  Re-name all references to netlink ports from pid to
portid to reflect this reality and avoid confusion with actual PIDs.  Ports
use the __u32 type, so re-type all portids accordingly.

(This patch is very similar to ebiederman's 5deadd69)
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

f9441639

audit: Simplify and correct audit_log_capset · ca24a23e

由 Eric W. Biederman 提交于 3月 19, 2013

- Always report the current process as capset now always only works on
  the current process.  This prevents reporting 0 or a random pid in
  a random pid namespace.

- Don't bother to pass the pid as is available.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
(cherry picked from commit bcc85f0af31af123e32858069eb2ad8f39f90e67)
(cherry picked from commit f911cac4556a7a23e0b3ea850233d13b32328692)
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[eparis: fix build error when audit disabled]
Signed-off-by: NEric Paris <eparis@redhat.com>

ca24a23e

07 11月, 2013 1 次提交

audit: fix type of sessionid in audit_set_loginuid() · 9175c9d2

由 Eric Paris 提交于 11月 06, 2013

sfr pointed out that with CONFIG_UIDGID_STRICT_TYPE_CHECKS set the audit
tree would not build.  This is because the oldsessionid in
audit_set_loginuid() was accidentally being declared as a kuid_t.  This
patch fixes that declaration mistake.

Example of problem:
kernel/auditsc.c: In function 'audit_set_loginuid':
kernel/auditsc.c:2003:15: error: incompatible types when assigning to
type 'kuid_t' from type 'int'
  oldsessionid = audit_get_sessionid(current);
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NEric Paris <eparis@redhat.com>

9175c9d2

06 11月, 2013 24 次提交

audit: call audit_bprm() only once to add AUDIT_EXECVE information · 9410d228

由 Richard Guy Briggs 提交于 10月 30, 2013

Move the audit_bprm() call from search_binary_handler() to exec_binprm().  This
allows us to get rid of the mm member of struct audit_aux_data_execve since
bprm->mm will equal current->mm.

This also mitigates the issue that ->argc could be modified by the
load_binary() call in search_binary_handler().

audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
context every time search_binary_handler() was recursively called.  Only one
reference is necessary.
Reported-by: NOleg Nesterov <onestero@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>
---
This patch is against 3.11, but was developed on Oleg's post-3.11 patches that
introduce exec_binprm().

9410d228

audit: move audit_aux_data_execve contents into audit_context union · d9cfea91

由 Richard Guy Briggs 提交于 10月 30, 2013

audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
context every time search_binary_handler() was recursively called.  Only one
reference is necessary, so just update it.  Move the the contents of
audit_aux_data_execve into the union in audit_context, removing dependence on a
kmalloc along the way.
Reported-by: NOleg Nesterov <onestero@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

d9cfea91

audit: remove unused envc member of audit_aux_data_execve · 9462dc59

由 Richard Guy Briggs 提交于 10月 23, 2013

Get rid of write-only audit_aux_data_exeve structure member envc.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

9462dc59

audit: Kill the unused struct audit_aux_data_capset · bd131fb1

由 Eric W. Biederman 提交于 3月 19, 2013

Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
(cherry picked from ebiederman commit 6904431d6b41190e42d6b94430b67cb4e7e6a4b7)
Signed-off-by: NEric Paris <eparis@redhat.com>

bd131fb1

audit: do not reject all AUDIT_INODE filter types · 78122037

由 Eric Paris 提交于 9月 04, 2013

commit ab61d38e tried to merge the
invalid filter checking into a single function.  However AUDIT_INODE
filters were not verified in the new generic checker.  Thus such rules
were being denied even though they were perfectly valid.

Ex:
$ auditctl -a exit,always -F arch=b64 -S open -F key=/foo -F inode=6955 -F devmajor=9 -F devminor=1
Error sending add rule data request (Invalid argument)
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

78122037

audit: log the audit_names record type · d3aea84a

由 Jeff Layton 提交于 5月 08, 2013

...to make it clear what the intent behind each record's operation was.

In many cases you can infer this, based on the context of the syscall
and the result. In other cases it's not so obvious. For instance, in
the case where you have a file being renamed over another, you'll have
two different records with the same filename but different inode info.
By logging this information we can clearly tell which one was created
and which was deleted.

This fixes what was broken in commit bfcec708.
Commit 79f6530c should also be backported to stable v3.7+.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

d3aea84a

audit: use given values in tty_audit enable api · b95d77fe

由 Richard Guy Briggs 提交于 5月 03, 2013

In send/GET, we don't want the kernel to lie about what value is set.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

b95d77fe

audit: use nlmsg_len() to get message payload length · 4d8fe737

由 Mathias Krause 提交于 9月 30, 2013

Using the nlmsg_len member of the netlink header to test if the message
is valid is wrong as it includes the size of the netlink header itself.
Thereby allowing to send short netlink messages that pass those checks.

Use nlmsg_len() instead to test for the right message length. The result
of nlmsg_len() is guaranteed to be non-negative as the netlink message
already passed the checks of nlmsg_ok().

Also switch to min_t() to please checkpatch.pl.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org  # v2.6.6+ for the 1st hunk, v2.6.23+ for the 2nd
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

4d8fe737

audit: use memset instead of trying to initialize field by field · e13f91e3

由 Eric Paris 提交于 11月 05, 2013

We currently are setting fields to 0 to initialize the structure
declared on the stack.  This is a bad idea as if the structure has holes
or unpacked space these will not be initialized.  Just use memset.  This
is not a performance critical section of code.
Signed-off-by: NEric Paris <eparis@redhat.com>

e13f91e3

audit: fix info leak in AUDIT_GET requests · 64fbff9a

由 Mathias Krause 提交于 9月 30, 2013

We leak 4 bytes of kernel stack in response to an AUDIT_GET request as
we miss to initialize the mask member of status_set. Fix that.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org  # v2.6.6+
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

64fbff9a

audit: update AUDIT_INODE filter rule to comparator function · db510fc5

由 Richard Guy Briggs 提交于 7月 04, 2013

It appears this one comparison function got missed in f368c07d (and 9c937dcc).
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

db510fc5

audit: audit feature to set loginuid immutable · 21b85c31

由 Eric Paris 提交于 5月 23, 2013

This adds a new 'audit_feature' bit which allows userspace to set it
such that the loginuid is absolutely immutable, even if you have
CAP_AUDIT_CONTROL.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

21b85c31

audit: audit feature to only allow unsetting the loginuid · d040e5af

由 Eric Paris 提交于 5月 24, 2013

This is a new audit feature which only grants processes with
CAP_AUDIT_CONTROL the ability to unset their loginuid.  They cannot
directly set it from a valid uid to another valid uid.  The ability to
unset the loginuid is nice because a priviledged task, like that of
container creation, can unset the loginuid and then priv is not needed
inside the container when a login daemon needs to set the loginuid.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

d040e5af

audit: allow unsetting the loginuid (with priv) · 81407c84

由 Eric Paris 提交于 5月 24, 2013

If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid.
This would allow a child of that task to set their loginuid without
CAP_AUDIT_CONTROL.  Thus when launching a new login daemon, a
priviledged helper would be able to unset the loginuid and then the
daemon, which may be malicious user facing, do not need priv to function
correctly.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

81407c84

audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE · 83fa6bbe

由 Eric Paris 提交于 5月 24, 2013

After trying to use this feature in Fedora we found the hard coding
policy like this into the kernel was a bad idea. Surprise surprise.
We ran into these problems because it was impossible to launch a
container as a logged in user and run a login daemon inside that container.
This reverts back to the old behavior before this option was added. The
option will be re-added in a userspace selectable manor such that
userspace can choose when it is and when it is not appropriate.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

83fa6bbe

audit: loginuid functions coding style · da0a6104

由 Eric Paris 提交于 5月 24, 2013

This is just a code rework.  It makes things more readable.  It does not
make any functional changes.

It does change the log messages to include both the old session id as
well the new and it includes a new res field, which means we get
messages even when the user did not have permission to change the
loginuid.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

da0a6104

audit: implement generic feature setting and retrieving · b0fed402

由 Eric Paris 提交于 5月 22, 2013

The audit_status structure was not designed with extensibility in mind.
Define a new AUDIT_SET_FEATURE message type which takes a new structure
of bits where things can be enabled/disabled/locked one at a time.  This
structure should be able to grow in the future while maintaining forward
and backward compatibility (based loosly on the ideas from capabilities
and prctl)

This does not actually add any features, but is just infrastructure to
allow new on/off types of audit system features.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

b0fed402

audit: change decimal constant to macro for invalid uid · 42f74461

由 Richard Guy Briggs 提交于 5月 20, 2013

SFR reported this 2013-05-15:

> After merging the final tree, today's linux-next build (i386 defconfig)
> produced this warning:
>
> kernel/auditfilter.c: In function 'audit_data_to_entry':
> kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only
> in ISO C90 [enabled by default]
>
> Introduced by commit 780a7654 ("audit: Make testing for a valid
> loginuid explicit") from Linus' tree.

Replace this decimal constant in the code with a macro to make it more readable
(add to the unsigned cast to quiet the warning).

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

42f74461

audit: printk USER_AVC messages when audit isn't enabled · 0868a5e1

由 Tyler Hicks 提交于 7月 25, 2013

When the audit=1 kernel parameter is absent and auditd is not running,
AUDIT_USER_AVC messages are being silently discarded.

AUDIT_USER_AVC messages should be sent to userspace using printk(), as
mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the
audit-disabled case for discarding user messages").

When audit_enabled is 0, audit_receive_msg() discards all user messages
except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg()
refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to
special case AUDIT_USER_AVC messages in both functions.

It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()")
introduced this bug.

Cc: <stable@kernel.org> # v2.6.25+
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

0868a5e1

audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context · d48d8051

由 Oleg Nesterov 提交于 9月 15, 2013

If audit_filter_task() nacks the new thread it makes sense
to clear TIF_SYSCALL_AUDIT which can be copied from parent
by dup_task_struct().

A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers
the "slow" audit paths in entry.S to ensure the task can not
miss audit_syscall_*() calls, this is pointless if the task
has no ->audit_context.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NSteve Grubb <sgrubb@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

d48d8051

Audit: remove duplicate comments · af0e493d

由 Gao feng 提交于 9月 23, 2013

Remove it.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

af0e493d

audit: remove newline accidentally added during session id helper refactor · b8f89caa

由 Richard Guy Briggs 提交于 9月 18, 2013

A newline was accidentally added during session ID helper refactorization in
commit 4d3fb709.  This needlessly uses up buffer space, messes up syslog
formatting and makes userspace processing less efficient.  Remove it.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

b8f89caa

audit: remove duplicate inclusion of the netlink header · 47145705

由 Ilya V. Matveychikov 提交于 9月 29, 2013

Signed-off-by: NIlya V. Matveychikov <matvejchikov@gmail.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

47145705

audit: format user messages to size of MAX_AUDIT_MESSAGE_LENGTH · b50eba7e

由 Richard Guy Briggs 提交于 9月 16, 2013

Messages of type AUDIT_USER_TTY were being formatted to 1024 octets,
truncating messages approaching MAX_AUDIT_MESSAGE_LENGTH (8970 octets).

Set the formatting to 8560 characters, given maximum estimates for prefix and
suffix budgets.

See the problem discussion:
https://www.redhat.com/archives/linux-audit/2009-January/msg00030.html

And the new size rationale:
https://www.redhat.com/archives/linux-audit/2013-September/msg00016.html

Test ~8k messages with:
auditctl -m "$(for i in $(seq -w 001 820);do echo -n "${i}0______";done)"
Reported-by: NLC Bruzenak <lenny@magitekltd.com>
Reported-by: NJustin Stephenson <jstephen@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

b50eba7e

29 10月, 2013 1 次提交

perf: Fix perf ring buffer memory ordering · bf378d34

由 Peter Zijlstra 提交于 10月 28, 2013

The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.
Reported-by: NVictor Kaplansky <victork@il.ibm.com>
Tested-by: NVictor Kaplansky <victork@il.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

bf378d34

25 10月, 2013 1 次提交

PM / hibernate: Move software_resume to late_initcall_sync · d3c345db

由 Russ Dill 提交于 10月 24, 2013

software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.

Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.
Signed-off-by: NRuss Dill <Russ.Dill@ti.com>
Acked-by: NPavel Machek <Pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d3c345db

23 10月, 2013 1 次提交

clockevents: Sanitize ticks to nsec conversion · 97b94106

由 Thomas Gleixner 提交于 9月 24, 2013

Marc Kleine-Budde pointed out, that commit 77cc982f "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.

The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.

The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.

Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:

      u64 clc = (latch << evt->shift) / evt->mult;

So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:

       u64 clc = (latch << evt->shift) + evt->mult - 1;

So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.

[ukl: move assignment to rnd after eventually changing mult, fix build
 issue and correct comment with the right math]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.deSigned-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>

97b94106

19 10月, 2013 1 次提交

mutex: Avoid gcc version dependent __builtin_constant_p() usage · b0267507

由 Tetsuo Handa 提交于 10月 17, 2013

Commit 040a0a37 ("mutex: Add support for wound/wait style locks")
used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot
handle such expression correctly, leading to boot failure when
built with CONFIG_DEBUG_MUTEXES=y.

Fix it by explicitly passing a bool which tells whether p != NULL
or not.

[ PeterZ: This is a sad patch, but provided it actually generates
          similar code I suppose its the best we can do bar whole
	  sale deprecating gcc-3. ]
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: peterz@infradead.org
Cc: imirkin@alum.mit.edu
Cc: daniel.vetter@ffwll.ch
Cc: robdclark@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>

b0267507

18 10月, 2013 1 次提交

perf: Disable PERF_RECORD_MMAP2 support · 3090ffb5

由 Stephane Eranian 提交于 10月 17, 2013

For now, we disable the extended MMAP record support (MMAP2).

We have identified cases where it would not report the correct mapping
information, clone(VM_CLONE) but with separate pids.  We will revisit
the support once we find a solution for this case.

The patch changes the kernel to return EINVAL if attr->mmap2 is set. The
patch also modifies the perf tool to use regular PERF_RECORD_MMAP for
synthetic events and it also prevents the tool from requesting
attr->mmap2 mode because the kernel would reject it.

The support will be revisited once the kenrel interface is updated.

In V2, we reduce the patch to the strict minimum.

In V3, we avoid calling perf_event_open() with mmap2 set because we know
it will fail and require fallback retry.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131017173215.GA8820@quadSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3090ffb5

14 10月, 2013 1 次提交

cgroup: fix to break the while loop in cgroup_attach_task() correctly · ea84753c

由 Anjana V Kumar 提交于 10月 12, 2013

Both Anjana and Eunki reported a stall in the while_each_thread loop
in cgroup_attach_task().

It's because, when we attach a single thread to a cgroup, if the cgroup
is exiting or is already in that cgroup, we won't break the loop.

If the task is already in the cgroup, the bug can lead to another thread
being attached to the cgroup unexpectedly:

  # echo 5207 > tasks
  # cat tasks
  5207
  # echo 5207 > tasks
  # cat tasks
  5207
  5215

What's worse, if the task to be attached isn't the leader of the thread
group, we might never exit the loop, hence cpu stall. Thanks for Oleg's
analysis.

This bug was introduced by commit 081aa458
("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()")

[ lizf: - fixed the first continue, pointed out by Oleg,
        - rewrote changelog. ]

Cc: <stable@vger.kernel.org> # 3.9+
Reported-by: NEunki Kim <eunki_kim@samsung.com>
Reported-by: NAnjana V Kumar <anjanavk12@gmail.com>
Signed-off-by: NAnjana V Kumar <anjanavk12@gmail.com>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

ea84753c

04 10月, 2013 1 次提交

perf: Fix perf_pmu_migrate_context · 9886167d

由 Peter Zijlstra 提交于 10月 03, 2013

While auditing the list_entry usage due to a trinity bug I found that
perf_pmu_migrate_context violates the rules for
perf_event::event_entry.

The problem is that perf_event::event_entry is a RCU list element, and
hence we must wait for a full RCU grace period before re-using the
element after deletion.

Therefore the usage in perf_pmu_migrate_context() which re-uses the
entry immediately is broken. For now introduce another list_head into
perf_event for this specific usage.

This doesn't actually fix the trinity report because that never goes
through this code.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-mkj72lxagw1z8fvjm648iznw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

9886167d

01 10月, 2013 4 次提交

irq: Force hardirq exit's softirq processing on its own stack · ded79754

由 Frederic Weisbecker 提交于 9月 24, 2013

The commit facd8b80
("irq: Sanitize invoke_softirq") converted irq exit
calls of do_softirq() to __do_softirq() on all architectures,
assuming it was only used there for its irq disablement
properties.

But as a side effect, the softirqs processed in the end
of the hardirq are always called on the inline current
stack that is used by irq_exit() instead of the softirq
stack provided by the archs that override do_softirq().

The result is mostly safe if the architecture runs irq_exit()
on a separate irq stack because then softirqs are processed
on that same stack that is near empty at this stage (assuming
hardirq aren't nesting).

Otherwise irq_exit() runs in the task stack and so does the softirq
too. The interrupted call stack can be randomly deep already and
the softirq can dig through it even further. To add insult to the
injury, this softirq can be interrupted by a new hardirq, maximizing
the chances for a stack overrun as reported in powerpc for example:

	do_IRQ: stack overflow: 1920
	CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
	Call Trace:
	[c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
	[c0000000050a8810] .dump_stack+0x28/0x3c
	[c0000000050a8880] .do_IRQ+0x2b8/0x2c0
	[c0000000050a8930] hardware_interrupt_common+0x154/0x180
	--- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
		LR = .cp_start_xmit+0x390/0x820 [8139cp]
	[c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
	[c0000000050a8e00] .sch_direct_xmit+0x110/0x260
	[c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
	[c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
	[c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
	[c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
	[c0000000050a9130] .dev_queue_xmit+0x428/0x630
	[c0000000050a91d0] .ip_finish_output+0x2a4/0x550
	[c0000000050a9290] .ip_local_out+0x50/0x70
	[c0000000050a9310] .ip_queue_xmit+0x148/0x420
	[c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
	[c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
	[c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
	[c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
	[c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
	[c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
	[c0000000050a9840] .ip_rcv_finish+0x148/0x400
	[c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
	[c0000000050a99d0] .netif_receive_skb+0x44/0x110
	[c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
	[c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
	[c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
	[c0000000050a9c70] .nf_iterate+0x114/0x130
	[c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
	[c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
	[c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
	[c0000000050a9fa0] .netif_receive_skb+0x44/0x110
	[c0000000050aa040] .napi_gro_receive+0xe8/0x120
	[c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
	[c0000000050aa1d0] .net_rx_action+0x1dc/0x310
	[c0000000050aa2b0] .__do_softirq+0x158/0x330
	[c0000000050aa3b0] .irq_exit+0xc8/0x110
	[c0000000050aa430] .do_IRQ+0xdc/0x2c0
	[c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
	 --- Exception: 501 at .bad_range+0x1c/0x110
		 LR = .get_page_from_freelist+0x908/0xbb0
	[c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
	[c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
	[c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
	[c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
	[c0000000050aac60] .handle_pte_fault+0x814/0xb70
	[c0000000050aad50] .__get_user_pages+0x1a4/0x640
	[c0000000050aae60] .get_user_pages_fast+0xec/0x160
	[c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
	[c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
	[c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
	[c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
	[c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
	[c0000000050ab320]  kvm_start_lightweight+0x1ec/0x1fc [kvm]
	[c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
	[c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
	[c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
	[c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
	[c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
	[c0000000050abd80] .SyS_ioctl+0xd4/0xf0
	[c0000000050abe30] syscall_exit+0x0/0x98

Since this is a regression, this patch proposes a minimalistic
and low-risk solution by blindly forcing the hardirq exit processing of
softirqs on the softirq stack. This way we should reduce significantly
the opportunities for task stack overflow dug by softirqs.

Longer term solutions may involve extending the hardirq stack coverage to
irq_exit(), etc...
Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: #3.9.. <stable@vger.kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>

ded79754

pidns: fix free_pid() to handle the first fork failure · 314a8ad0

由 Oleg Nesterov 提交于 9月 30, 2013

"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

314a8ad0

kernel/kmod.c: check for NULL in call_usermodehelper_exec() · 4c1c7be9

由 Tetsuo Handa 提交于 9月 30, 2013

If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer
dereference happens upon core dump because argv_split("") returns
argv[0] == NULL.

This bug was once fixed by commit 264b83c0 ("usermodehelper: check
subprocess_info->path != NULL") but was by error reintroduced by commit
7f57cfa4 ("usermodehelper: kill the sub_info->path[0] check").

This bug seems to exist since 2.6.19 (the version which core dump to
pipe was added).  Depending on kernel version and config, some side
effect might happen immediately after this oops (e.g.  kernel panic with
2.6.32-358.18.1.el6).
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c1c7be9

PM / hibernate: Fix user space driven resume regression · aab17289

由 Rafael J. Wysocki 提交于 9月 30, 2013

Recent commit 8fd37a4c (PM / hibernate: Create memory bitmaps after
freezing user space) broke the resume part of the user space driven
hibernation (s2disk), because I forgot that the resume utility
loaded the image into memory without freezing user space (it still
freezes tasks after loading the image).  This means that during user
space driven resume we need to create the memory bitmaps at the
"device open" time rather than at the "freeze tasks" time, so make
that happen (that's a special case anyway, so it needs to be treated
in a special way).
Reported-and-tested-by: NRonald <ronald645@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

aab17289

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功