提交 · 4766b199ef9e1ca6316ee4f8f9d80c2ba1ed0290 · openeuler / raspberrypi-kernel

24 2月, 2015 5 次提交

audit: consolidate handling of mm->exe_file · 4766b199

由 Davidlohr Bueso 提交于 2月 22, 2015

This patch adds a audit_log_d_path_exe() helper function
to share how we handle auditing of the exe_file's path.
Used by both audit and auditsc. No functionality is changed.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

4766b199

audit: code clean up · 5985de67

由 Ameen Ali 提交于 2月 23, 2015

Fixed a coding style issue (unnecessary parentheses , unnecessary braces)
Signed-off-by: NAmeen-Ali <Ameenali023@gmail.com>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

5985de67

audit: don't reset working wait time accidentally with auditd · efef73a1

由 Richard Guy Briggs 提交于 2月 23, 2015

During a queue overflow condition while we are waiting for auditd to drain the
queue to make room for regular messages, we don't want a successful auditd that
has bypassed the queue check to reset the backlog wait time.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

efef73a1

audit: don't lose set wait time on first successful call to audit_log_start() · a77ed4e5

由 Richard Guy Briggs 提交于 2月 23, 2015

Copy the set wait time to a working value to avoid losing the set
value if the queue overflows.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

a77ed4e5

audit: move the tree pruning to a dedicated thread · f1aaf262

由 Imre Palik 提交于 2月 23, 2015

When file auditing is enabled, during a low memory situation, a memory
allocation with __GFP_FS can lead to pruning the inode cache.  Which can,
in turn lead to audit_tree_freeing_mark() being called.  This can call
audit_schedule_prune(), that tries to fork a pruning thread, and
waits until the thread is created.  But forking needs memory, and the
memory allocations there are done with __GFP_FS.

So we are waiting merrily for some __GFP_FS memory allocations to complete,
while holding some filesystem locks.  This can take a while ...

This patch creates a single thread for pruning the tree from
audit_add_tree_rule(), and thus avoids the deadlock that the on-demand
thread creation can cause.
Reported-by: NMatt Wilson <msw@amazon.com>
Cc: Matt Wilson <msw@amazon.com>
Signed-off-by: NImre Palik <imrep@amazon.de>
Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

f1aaf262

20 1月, 2015 1 次提交

audit: remove vestiges of vers_ops · 2fded7f4

由 Richard Guy Briggs 提交于 12月 23, 2014

Should have been removed with commit 18900909 ("audit: remove the old
depricated kernel interface").
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

2fded7f4

30 12月, 2014 1 次提交

audit: create private file name copies when auditing inodes · fcf22d82

由 Paul Moore 提交于 12月 30, 2014

Unfortunately, while commit 4a928436 ("audit: correctly record file
names with different path name types") fixed a problem where we were
not recording filenames, it created a new problem by attempting to use
these file names after they had been freed.  This patch resolves the
issue by creating a copy of the filename which the audit subsystem
frees after it is done with the string.

At some point it would be nice to resolve this issue with refcounts,
or something similar, instead of having to allocate/copy strings, but
that is almost surely beyond the scope of a -rcX patch so we'll defer
that for later.  On the plus side, only audit users should be impacted
by the string copying.
Reported-by: NToralf Foerster <toralf.foerster@gmx.de>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

fcf22d82

24 12月, 2014 1 次提交

audit: restore AUDIT_LOGINUID unset ABI · 041d7b98

由 Richard Guy Briggs 提交于 12月 23, 2014

A regression was caused by commit 780a7654:
	 audit: Make testing for a valid loginuid explicit.
(which in turn attempted to fix a regression caused by e1760bd5)

When audit_krule_to_data() fills in the rules to get a listing, there was a
missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.

This broke userspace by not returning the same information that was sent and
expected.

The rule:
	auditctl -a exit,never -F auid=-1
gives:
	auditctl -l
		LIST_RULES: exit,never f24=0 syscall=all
when it should give:
		LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all

Tag it so that it is reported the same way it was set.  Create a new
private flags audit_krule field (pflags) to store it that won't interact with
the public one from the API.

Cc: stable@vger.kernel.org # v3.10-rc1+
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

041d7b98

23 12月, 2014 1 次提交

audit: correctly record file names with different path name types · 4a928436

由 Paul Moore 提交于 12月 22, 2014

There is a problem with the audit system when multiple audit records
are created for the same path, each with a different path name type.
The root cause of the problem is in __audit_inode() when an exact
match (both the path name and path name type) is not found for a
path name record; the existing code creates a new path name record,
but it never sets the path name in this record, leaving it NULL.
This patch corrects this problem by assigning the path name to these
newly created records.

There are many ways to reproduce this problem, but one of the
easiest is the following (assuming auditd is running):

  # mkdir /root/tmp/test
  # touch /root/tmp/test/567
  # auditctl -a always,exit -F dir=/root/tmp/test
  # touch /root/tmp/test/567

Afterwards, or while the commands above are running, check the audit
log and pay special attention to the PATH records.  A faulty kernel
will display something like the following for the file creation:

  type=SYSCALL msg=audit(1416957442.025:93): arch=c000003e syscall=2
    success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
  type=CWD msg=audit(1416957442.025:93):  cwd="/root/tmp"
  type=PATH msg=audit(1416957442.025:93): item=0 name="test/"
    inode=401409 ... nametype=PARENT
  type=PATH msg=audit(1416957442.025:93): item=1 name=(null)
    inode=393804 ... nametype=NORMAL
  type=PATH msg=audit(1416957442.025:93): item=2 name=(null)
    inode=393804 ... nametype=NORMAL

While a patched kernel will show the following:

  type=SYSCALL msg=audit(1416955786.566:89): arch=c000003e syscall=2
    success=yes exit=3 ... comm="touch" exe="/usr/bin/touch"
  type=CWD msg=audit(1416955786.566:89):  cwd="/root/tmp"
  type=PATH msg=audit(1416955786.566:89): item=0 name="test/"
    inode=401409 ... nametype=PARENT
  type=PATH msg=audit(1416955786.566:89): item=1 name="test/567"
    inode=393804 ... nametype=NORMAL

This issue was brought up by a number of people, but special credit
should go to hujianyang@huawei.com for reporting the problem along
with an explanation of the problem and a patch.  While the original
patch did have some problems (see the archive link below), it did
demonstrate the problem and helped kickstart the fix presented here.

  * https://lkml.org/lkml/2014/9/5/66Reported-by: Nhujianyang <hujianyang@huawei.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Acked-by: NRichard Guy Briggs <rgb@redhat.com>

4a928436

20 12月, 2014 2 次提交

audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb · 54dc77d9

由 Richard Guy Briggs 提交于 12月 18, 2014

Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
audit_log_end(), which can come from any context (aka even a sleeping context)
GFP_KERNEL can't be used.  Since the audit_buffer knows what context it should
use, pass that down and use that.

See: https://lkml.org/lkml/2014/12/16/542

BUG: sleeping function called from invalid context at mm/slab.c:2849
in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
2 locks held by sulogin/885:
  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
  #1:  (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
  ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
  ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
  0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
Call Trace:
  [<ffffffff916ba529>] dump_stack+0x50/0xa8
  [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
  [<ffffffff910632a6>] __might_sleep+0x119/0x128
  [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
  [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
  [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
  [<ffffffff914e2b62>] skb_copy+0x3e/0xa3
  [<ffffffff910c263e>] audit_log_end+0x83/0x100
  [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
  [<ffffffff91252a73>] common_lsm_audit+0x441/0x450
  [<ffffffff9123c163>] slow_avc_audit+0x63/0x67
  [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
  [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
  [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
  [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
  [<ffffffff911515e6>] install_exec_creds+0xe/0x79
  [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
  [<ffffffff9115198e>] search_binary_handler+0x81/0x18c
  [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
  [<ffffffff91153669>] do_execve+0x1f/0x21
  [<ffffffff91153967>] SyS_execve+0x25/0x29
  [<ffffffff916c61a9>] stub_execve+0x69/0xa0

Cc: stable@vger.kernel.org #v3.16-rc1
Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Tested-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

54dc77d9

audit: don't attempt to lookup PIDs when changing PID filtering audit rules · 3640dcfa

由 Paul Moore 提交于 12月 19, 2014

Commit f1dc4867 ("audit: anchor all pid references in the initial pid
namespace") introduced a find_vpid() call when adding/removing audit
rules with PID/PPID filters; unfortunately this is problematic as
find_vpid() only works if there is a task with the associated PID
alive on the system.  The following commands demonstrate a simple
reproducer.

	# auditctl -D
	# auditctl -l
	# autrace /bin/true
	# auditctl -l

This patch resolves the problem by simply using the PID provided by
the user without any additional validation, e.g. no calls to check to
see if the task/PID exists.

Cc: stable@vger.kernel.org # 3.15
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>

3640dcfa

18 11月, 2014 1 次提交

audit: convert status version to a feature bitmap · 0288d718

由 Richard Guy Briggs 提交于 11月 17, 2014

The version field defined in the audit status structure was found to have
limitations in terms of its expressibility of features supported.  This is
distict from the get/set features call to be able to command those features
that are present.

Converting this field from a version number to a feature bitmap will allow
distributions to selectively backport and support certain features and will
allow upstream to be able to deprecate features in the future.  It will allow
userspace clients to first query the kernel for which features are actually
present and supported.  Currently, EINVAL is returned rather than EOPNOTSUP,
which isn't helpful in determining if there was an error in the command, or if
it simply isn't supported yet.  Past features are not represented by this
bitmap, but their use may be converted to EOPNOTSUP if needed in the future.

Since "version" is too generic to convert with a #define, use a union in the
struct status, introducing the member "feature_bitmap" unionized with
"version".

Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP*
counterparts, leaving the former for backwards compatibility.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: minor whitespace tweaks]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

0288d718

12 11月, 2014 1 次提交

audit: keep inode pinned · 799b6014

由 Miklos Szeredi 提交于 11月 04, 2014

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a5 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org # 2.6.36+
Signed-off-by: NPaul Moore <pmoore@redhat.com>

799b6014

31 10月, 2014 1 次提交

audit: AUDIT_FEATURE_CHANGE message format missing delimiting space · 897f1acb

由 Richard Guy Briggs 提交于 10月 30, 2014

Add a space between subj= and feature= fields to make them parsable.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>

897f1acb

11 10月, 2014 4 次提交

audit: rename audit_log_remove_rule to disambiguate for trees · 2991dd2b

由 Richard Guy Briggs 提交于 10月 02, 2014

Rename audit_log_remove_rule() to audit_tree_log_remove_rule() to avoid
confusion with watch and mark rule removal/changes.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

2991dd2b

audit: cull redundancy in audit_rule_change · e85322d2

由 Richard Guy Briggs 提交于 10月 02, 2014

Re-factor audit_rule_change() to reduce the amount of code redundancy and
simplify the logic.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

e85322d2

E
audit: WARN if audit_rule_change called illegally · 739c9503
由 Eric Paris 提交于 10月 10, 2014
```
Signed-off-by: NEric Paris <eparis@redhat.com>
```
739c9503

audit: put rule existence check in canonical order · 3639f170

由 Richard Guy Briggs 提交于 10月 02, 2014

Use same rule existence check order as audit_make_tree(), audit_to_watch(),
update_lsm_rule() for legibility.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

3639f170

24 9月, 2014 13 次提交

audit: get comm using lock to avoid race in string printing · 9eab339b

由 Richard Guy Briggs 提交于 3月 15, 2014

When task->comm is passed directly to audit_log_untrustedstring() without
getting a copy or using the task_lock, there is a race that could happen that
would output a NULL (\0) in the output string that would effectively truncate
the rest of the report text after the comm= field in the audit, losing fields.

Use get_task_comm() to get a copy while acquiring the task_lock to prevent
this and to prevent the result from being a mixture of old and new values of
comm.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

9eab339b

audit: remove open_arg() function that is never used · f874738e

由 Richard Guy Briggs 提交于 9月 15, 2014

open_arg() was added in commit 55669bfa "audit: AUDIT_PERM support"
and never used.  Remove it.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

f874738e

audit: correct AUDIT_GET_FEATURE return message type · 9ef91514

由 Richard Guy Briggs 提交于 8月 24, 2014

When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature.  The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.

This appears to have been a cut-and-paste-eo in commit b0fed402.
Reported-by: NSteve Grubb <sgrubb@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

9ef91514

audit: set nlmsg_len for multicast messages. · 54e05edd

由 Richard Guy Briggs 提交于 8月 21, 2014

Report:
	Looking at your example code in
	http://people.redhat.com/rbriggs/audit-multicast-listen/audit-multicast-listen.c,
	it seems that nlmsg_len field in the received messages is supposed to
	contain the length of the header + payload, but it is always set to the
	size of the header only, i.e. 16. The example program works, because
	the printf format specifies the minimum width, not "precision", so it
	simply prints out the payload until the first zero byte. This isn't too
	much of a problem, but precludes the use of recvmmsg, iiuc?

	(gdb) p *(struct nlmsghdr*)nlh
	$14 = {nlmsg_len = 16, nlmsg_type = 1100, nlmsg_flags = 0, nlmsg_seq = 0, nlmsg_pid = 9910}

The only time nlmsg_len would have been updated was at audit_buffer_alloc()
inside audit_log_start() and never updated after.  It should arguably be done
in audit_log_vformat(), but would be more efficient in audit_log_end().
Reported-by: NZbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

54e05edd

audit: use union for audit_field values since they are mutually exclusive · 219ca394

由 Richard Guy Briggs 提交于 3月 26, 2014

Since only one of val, uid, gid and lsm* are used at any given time, combine
them to reduce the size of the struct audit_field.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

219ca394

audit: invalid op= values for rules · e7df61f4

由 Burn Alting 提交于 4月 04, 2014

Various audit events dealing with adding, removing and updating rules result in
invalid values set for the op keys which result in embedded spaces in op=
values.

The invalid values are
        op="add rule"       set in kernel/auditfilter.c
        op="remove rule"    set in kernel/auditfilter.c
        op="remove rule"    set in kernel/audit_tree.c
        op="updated rules"  set in kernel/audit_watch.c
        op="remove rule"    set in kernel/audit_watch.c

Replace the space in the above values with an underscore character ('_').
Coded-by: NBurn Alting <burn@swtf.dyndns.org>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

e7df61f4

audit: use atomic_t to simplify audit_serial() · 01478d7d

由 Richard Guy Briggs 提交于 6月 13, 2014

Since there is already a primitive to do this operation in the atomic_t, use it
to simplify audit_serial().
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

01478d7d

kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0] · 6eed9b26

由 Fabian Frederick 提交于 6月 03, 2014

Use kernel.h definition.

Cc: Eric Paris <eparis@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

6eed9b26

audit: reduce scope of audit_log_fcaps · 691e6d59

由 Richard Guy Briggs 提交于 5月 26, 2014

audit_log_fcaps() isn't used outside kernel/audit.c.  Reduce its scope.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

691e6d59

audit: reduce scope of audit_net_id · c0a8d9b0

由 Richard Guy Briggs 提交于 5月 26, 2014

audit_net_id isn't used outside kernel/audit.c.  Reduce its scope.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

c0a8d9b0

audit: x86: drop arch from __audit_syscall_entry() interface · b4f0d375

由 Richard Guy Briggs 提交于 3月 04, 2014

Since the arch is found locally in __audit_syscall_entry(), there is no need to
pass it in as a parameter.  Delete it from the parameter list.

x86* was the only arch to call __audit_syscall_entry() directly and did so from
assembly code.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-audit@redhat.com
Signed-off-by: NEric Paris <eparis@redhat.com>

---

As this patch relies on changes in the audit tree, I think it
appropriate to send it through my tree rather than the x86 tree.

b4f0d375

audit: add arch field to seccomp event log · 84db564a

由 Richard Guy Briggs 提交于 1月 29, 2014

The AUDIT_SECCOMP record looks something like this:

type=SECCOMP msg=audit(1373478171.953:32775): auid=4325 uid=4325 gid=4325 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0 pid=12381 comm="test" sig=31 syscall=231 compat=0 ip=0x39ea8bca89 code=0x0

In order to determine what syscall 231 maps to, we need to have the arch= field right before it.

To see the event, compile this test.c program:

=====
int main(void)
{
        return seccomp_load(seccomp_init(SCMP_ACT_KILL));
}
=====

gcc -g test.c -o test -lseccomp

After running the program, find the record by:  ausearch --start recent -m SECCOMP -i
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
signed-off-by: NEric Paris <eparis@redhat.com>

84db564a

audit: __audit_syscall_entry: ignore arch arg and call syscall_get_arch() directly · 4a99854c

由 Richard Guy Briggs 提交于 2月 28, 2014

Since every arch should have syscall_get_arch() defined, stop using the
function argument and just collect this ourselves.  We do not drop the
argument as fixing some code paths (in assembly) to not pass this first
argument is non-trivial.  The argument will be dropped when that is
fixed.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

4a99854c

01 8月, 2014 1 次提交

timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks · 504d5874

由 Jan Kara 提交于 8月 01, 2014

clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04be #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
 (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c

but task is already holding lock:
 (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (hrtimer_bases.lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
       [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
       [<81080792>] task_clock_event_start+0x3a/0x3f
       [<810807a4>] task_clock_event_add+0xd/0x14
       [<8108259a>] event_sched_in+0xb6/0x17a
       [<810826a2>] group_sched_in+0x44/0x122
       [<81082885>] ctx_sched_in.isra.67+0x105/0x11f
       [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
       [<81082bf6>] __perf_install_in_context+0x8b/0xa3
       [<8107eb8e>] remote_function+0x12/0x2a
       [<8105f5af>] smp_call_function_single+0x2d/0x53
       [<8107e17d>] task_function_call+0x30/0x36
       [<8107fb82>] perf_install_in_context+0x87/0xbb
       [<810852c9>] SYSC_perf_event_open+0x5c6/0x701
       [<810856f9>] SyS_perf_event_open+0x17/0x19
       [<8142f8ee>] syscall_call+0x7/0xb

-> #4 (&ctx->lock){......}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

-> #3 (&rq->lock){-.-.-.}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f04c>] _raw_spin_lock+0x21/0x30
       [<81040873>] __task_rq_lock+0x33/0x3a
       [<8104184c>] wake_up_new_task+0x25/0xc2
       [<8102474b>] do_fork+0x15c/0x2a0
       [<810248a9>] kernel_thread+0x1a/0x1f
       [<814232a2>] rest_init+0x1a/0x10e
       [<817af949>] start_kernel+0x303/0x308
       [<817af2ab>] i386_start_kernel+0x79/0x7d

-> #2 (&p->pi_lock){-.-...}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<810413dd>] try_to_wake_up+0x1d/0xd6
       [<810414cd>] default_wake_function+0xb/0xd
       [<810461f3>] __wake_up_common+0x39/0x59
       [<81046346>] __wake_up+0x29/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #1 (&tty->write_wait){-.....}:
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<81046332>] __wake_up+0x15/0x3b
       [<811b8733>] tty_wakeup+0x49/0x51
       [<811c3568>] uart_write_wakeup+0x17/0x19
       [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
       [<811c5f28>] serial8250_handle_irq+0x54/0x6a
       [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
       [<811c56d8>] serial8250_interrupt+0x38/0x9e
       [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
       [<81051296>] handle_irq_event+0x2c/0x43
       [<81052cee>] handle_level_irq+0x57/0x80
       [<81002a72>] handle_irq+0x46/0x5c
       [<810027df>] do_IRQ+0x32/0x89
       [<8143036e>] common_interrupt+0x2e/0x33
       [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
       [<811c25a4>] uart_start+0x2d/0x32
       [<811c2c04>] uart_write+0xc7/0xd6
       [<811bc6f6>] n_tty_write+0xb8/0x35e
       [<811b9beb>] tty_write+0x163/0x1e4
       [<811b9cd9>] redirected_tty_write+0x6d/0x75
       [<810b6ed6>] vfs_write+0x75/0xb0
       [<810b7265>] SyS_write+0x44/0x77
       [<8142f8ee>] syscall_call+0x7/0xb

-> #0 (&port_lock_key){-.....}:
       [<8104a62d>] __lock_acquire+0x9ea/0xc6d
       [<8104a942>] lock_acquire+0x92/0x101
       [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
       [<811c60be>] serial8250_console_write+0x8c/0x10c
       [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
       [<8104f5d5>] console_unlock+0x1d7/0x398
       [<8104fb70>] vprintk_emit+0x3da/0x3e4
       [<81425f76>] printk+0x17/0x19
       [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
       [<8105c548>] clockevents_program_event+0xe7/0xf3
       [<8105cc1c>] tick_program_event+0x1e/0x23
       [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
       [<8103c49e>] __remove_hrtimer+0x5b/0x79
       [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
       [<8103cb4b>] hrtimer_cancel+0xd/0x18
       [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
       [<81080705>] task_clock_event_stop+0x20/0x64
       [<81080756>] task_clock_event_del+0xd/0xf
       [<81081350>] event_sched_out+0xab/0x11e
       [<810813e0>] group_sched_out+0x1d/0x66
       [<81081682>] ctx_sched_out+0xaf/0xbf
       [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
       [<8142cacc>] __schedule+0x4c6/0x4cb
       [<8142cae0>] schedule+0xf/0x11
       [<8142f9a6>] work_resched+0x5/0x30

other info that might help us debug this:

Chain exists of:
  &port_lock_key --> &ctx->lock --> hrtimer_bases.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hrtimer_bases.lock);
                               lock(&ctx->lock);
                               lock(hrtimer_bases.lock);
  lock(&port_lock_key);

 *** DEADLOCK ***

4 locks held by trinity-main/74:
 #0:  (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
 #1:  (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
 #2:  (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
 #3:  (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4

stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04be #2
 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
 [<81426f69>] dump_stack+0x16/0x18
 [<81425a99>] print_circular_bug+0x18f/0x19c
 [<8104a62d>] __lock_acquire+0x9ea/0xc6d
 [<8104a942>] lock_acquire+0x92/0x101
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
 [<811c60be>] ? serial8250_console_write+0x8c/0x10c
 [<811c60be>] serial8250_console_write+0x8c/0x10c
 [<8104af87>] ? lock_release+0x191/0x223
 [<811c6032>] ? wait_for_xmitr+0x76/0x76
 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118
 [<8104f5d5>] console_unlock+0x1d7/0x398
 [<8104fb70>] vprintk_emit+0x3da/0x3e4
 [<81425f76>] printk+0x17/0x19
 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116
 [<8105cc1c>] tick_program_event+0x1e/0x23
 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
 [<8103c49e>] __remove_hrtimer+0x5b/0x79
 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
 [<8103cb4b>] hrtimer_cancel+0xd/0x18
 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
 [<81080705>] task_clock_event_stop+0x20/0x64
 [<81080756>] task_clock_event_del+0xd/0xf
 [<81081350>] event_sched_out+0xab/0x11e
 [<810813e0>] group_sched_out+0x1d/0x66
 [<81081682>] ctx_sched_out+0xaf/0xbf
 [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
 [<8104416d>] ? __dequeue_entity+0x23/0x27
 [<81044505>] ? pick_next_task_fair+0xb1/0x120
 [<8142cacc>] __schedule+0x4c6/0x4cb
 [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
 [<810475b0>] ? trace_hardirqs_off+0xb/0xd
 [<81056346>] ? rcu_irq_exit+0x64/0x77

Fix the problem by using printk_deferred() which does not call into the
scheduler.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

504d5874

31 7月, 2014 3 次提交

kexec: fix build error when hugetlbfs is disabled · 3a1122d2

由 David Rientjes 提交于 7月 30, 2014

free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no
need to filter PageHuge() page is such a configuration either, so avoid
exporting the symbol to fix a build error:

   In file included from kernel/kexec.c:14:0:
   kernel/kexec.c: In function 'crash_save_vmcoreinfo_init':
   kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function)
     VMCOREINFO_SYMBOL(free_huge_page);
                       ^

Introduced by commit 8f1d26d0 ("kexec: export free_huge_page to
VMCOREINFO")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Acked-by: NOlof Johansson <olof@lixom.net>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a1122d2

Josh has moved · e0198b29

由 Josh Triplett 提交于 7月 30, 2014

My IBM email addresses haven't worked for years; also map some
old-but-functional forwarding addresses to my canonical address.

Update my GPG key fingerprint; I moved to 4096R a long time ago.

Update description.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0198b29

kexec: export free_huge_page to VMCOREINFO · 8f1d26d0

由 Atsushi Kumagai 提交于 7月 30, 2014

PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56b
("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
another symbol to filter *hugetlbfs* pages.

If a user hope to filter user pages, makedumpfile tries to exclude them by
checking the condition whether the page is anonymous, but hugetlbfs pages
aren't anonymous while they also be user pages.

We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():

    int PageHuge(struct page *page)
    {
            if (!PageCompound(page))
                    return 0;

            page = compound_head(page);
            return get_compound_page_dtor(page) == free_huge_page;
    }

For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.
Signed-off-by: NAtsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Acked-by: NBaoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8f1d26d0

24 7月, 2014 1 次提交

sched_clock: Avoid corrupting hrtimer tree during suspend · f723aa18

由 Stephen Boyd 提交于 7月 23, 2014

During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Fixes: a08ca5d1 "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f723aa18

21 7月, 2014 1 次提交

tracing: Fix wraparound problems in "uptime" trace clock · 58d4e21e

由 Tony Luck 提交于 7月 18, 2014

The "uptime" trace clock added in:

    commit 8aacf017
    tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
        (u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com

Cc: stable@vger.kernel.org # 3.10+
Fixes: 8aacf017 "tracing: Add "uptime" trace clock that uses jiffies"
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

58d4e21e

18 7月, 2014 1 次提交

kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64 · d81b4253

由 Masami Hiramatsu 提交于 7月 17, 2014

On ia64 and ppc64, function pointers do not point to the
entry address of the function, but to the address of a
function descriptor (which contains the entry address and misc
data).

Since the kprobes code passes the function pointer stored
by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for
initalizing its blacklist, it fails and reports many errors,
such as:

  Failed to find blacklist 0001013168300000
  Failed to find blacklist 0001013000f0a000
  [...]

To fix this bug, use arch_deref_entry_point() to get the
function entry address for kallsyms_lookup_size_offset()
instead of the raw function pointer.

Suzuki also pointed out that blacklist entries should also
be updated as well.
Reported-by: NTony Luck <tony.luck@gmail.com>
Fixed-by: NSuzuki K. Poulose <suzuki@in.ibm.com>
Tested-by: NTony Luck <tony.luck@intel.com>
Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (for powerpc)
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: sparse@chrisli.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: akataria@vmware.com
Cc: anil.s.keshavamurthy@intel.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: rdunlap@infradead.org
Cc: dl9pf@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140717114411.13401.2632.stgit@kbuild-fedora.novalocalSigned-off-by: NIngo Molnar <mingo@kernel.org>

d81b4253

16 7月, 2014 2 次提交

locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER · 5db6c6fe

由 Davidlohr Bueso 提交于 7月 11, 2014

Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
encapsulate the dependencies for rwsem optimistic spinning.
No logical changes here as it continues to depend on both
SMP and the XADD algorithm variant.
Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
Acked-by: NJason Low <jason.low2@hp.com>
[ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
Cc: aswin@hp.com
Cc: Chris Mason <clm@fb.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

5db6c6fe

locking/mutex: Disable optimistic spinning on some architectures · 4badad35

由 Peter Zijlstra 提交于 6月 06, 2014

The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.

There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.

Opt in for known good archs.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: stable@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

4badad35