提交 · 56179a6ec65a56e0279a58e35cb450d38f061b94 · openeuler / raspberrypi-kernel

18 1月, 2012 9 次提交

audit: drop some potentially inadvisable likely notations · 56179a6e

由 Eric Paris 提交于 1月 03, 2012

The audit code makes heavy use of likely() and unlikely() macros, but they
don't always make sense.  Drop any that seem questionable and let the
computer do it's thing.
Signed-off-by: NEric Paris <eparis@redhat.com>

56179a6e

audit: remove AUDIT_SETUP_CONTEXT as it isn't used · 997f5b64

由 Eric Paris 提交于 1月 03, 2012

Audit contexts have 3 states.  Disabled, which doesn't collect anything,
build, which collects info but might not emit it, and record, which
collects and emits.  There is a 4th state, setup, which isn't used.  Get
rid of it.
Signed-off-by: NEric Paris <eparis@redhat.com>

997f5b64

audit: inline audit_syscall_entry to reduce burden on archs · b05d8447

由 Eric Paris 提交于 1月 03, 2012

Every arch calls:

if (unlikely(current->audit_context))
	audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.
Signed-off-by: NEric Paris <eparis@redhat.com>

b05d8447

Audit: push audit success and retcode into arch ptrace.h · d7e7528b

由 Eric Paris 提交于 1月 03, 2012

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]

d7e7528b

seccomp: audit abnormal end to a process due to seccomp · 85e7bac3

由 Eric Paris 提交于 1月 03, 2012

The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.
Signed-off-by: NEric Paris <eparis@redhat.com>

85e7bac3

audit: check current inode and containing object when filtering on major and minor · 16c174bd

由 Eric Paris 提交于 1月 03, 2012

The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon.  Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
say we add a watch with a filter on 8,1.  If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.

Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents.  We might hope that
this would also be logged, but it isn't.  The rules will check the
major,minor of the device containing /dev/sda1.  In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.

I believe these rules should trigger on either device.  The man page is
devoid of useful information about the intended semantics.  It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...
Signed-off-by: NEric Paris <eparis@redhat.com>

16c174bd

audit: drop the meaningless and format breaking word 'user' · 3035c51e

由 Eric Paris 提交于 1月 03, 2012

userspace audit messages look like so:

type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''

That third field just says 'user'. That's useless and doesn't follow the
key=value pair we are trying to enforce. We already know it came from the
user based on the record type. Kill that word. Die.
Signed-off-by: NEric Paris <eparis@redhat.com>

3035c51e

audit: dynamically allocate audit_names when not enough space is in the names array · 5195d8e2

由 Eric Paris 提交于 1月 03, 2012

This patch does 2 things.  First it reduces the number of audit_names
allocated in every audit context from 20 to 5.  5 should be enough for all
'normal' syscalls (rename being the worst).  Some syscalls can still touch
more the 5 inodes such as mount.  When rpc filesystem is mounted it will
create inodes and those can exceed 5.  To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5.  This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.
Signed-off-by: NEric Paris <eparis@redhat.com>

5195d8e2

audit: make filetype matching consistent with other filters · 5ef30ee5

由 Eric Paris 提交于 1月 03, 2012

Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list. The filetype matching
however had a strange way of doing things. It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel. Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea. As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field. The kernel always checked the first
name collected which for the tested rules was always correct.

This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected. It
also changes the rule validation to reject the old unused rule types.

Noone knew it was there. Noone used it. Why keep around the extra code?
Signed-off-by: NEric Paris <eparis@redhat.com>

5ef30ee5

11 1月, 2012 5 次提交

user namespace: make signal.c respect user namespaces · 6b550f94

由 Serge E. Hallyn 提交于 1月 10, 2012

ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)

__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)

do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace

ptrace_signal maps parent's uid into current's user namespace before
including in signal to current.  IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.

Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
	simplify callers and help make clear what we are translating
        (which uid into which namespace).  Passing the target task would
	make callers even easier to read, but we pass in user_ns because
	current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
	in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
	ancestor or unrelated user namespace.  Pass that on to __send_signal,
	which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
	SI_FROMKERNEL cases at callers, because we can't assume sender is
	current in those cases.
Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Serge Hallyn <serge.hallyn@canonical.com>
Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)

Eric Biederman pointed out that passing info is a bug and could lead to a
NULL pointer deref to boot.

A collection of signal, securebits, filecaps, cap_bounds, and a few other
ltp tests passed with this kernel.

Changelog:
    Nov 18: previous patch missed a leading '&'
Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: ipc/mqueue: lock() => unlock() typo

There was a double lock typo introduced in b085f4bd6b21 "user namespace:
make signal.c respect user namespaces"
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6b550f94

workqueue: make alloc_workqueue() take printf fmt and args for name · b196be89

由 Tejun Heo 提交于 1月 10, 2012

alloc_workqueue() currently expects the passed in @name pointer to remain
accessible.  This is inconvenient and a bit silly given that the whole wq
is being dynamically allocated.  This patch updates alloc_workqueue() and
friends to take printf format string instead of opaque string and matching
varargs at the end.  The name is allocated together with the wq and
formatted.

alloc_ordered_workqueue() is converted to a macro to unify varargs
handling with alloc_workqueue(), and, while at it, add comment to
alloc_workqueue().

None of the current in-kernel users pass in string with '%' as constant
name and this change shouldn't cause any problem.

[akpm@linux-foundation.org: use __printf]
Signed-off-by: NTejun Heo <tj@kernel.org>
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b196be89

signal: add block_sigmask() for adding sigmask to current->blocked · 5e6292c0

由 Matt Fleming 提交于 1月 10, 2012

Abstract the code sequence for adding a signal handler's sa_mask to
current->blocked because the sequence is identical for all architectures.
Furthermore, in the past some architectures actually got this code wrong,
so introduce a wrapper that all architectures can use.
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5e6292c0

tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113

由 KAMEZAWA Hiroyuki 提交于 1月 10, 2012

oom_score_adj is used for guarding processes from OOM-Killer.  One of
problem is that it's inherited at fork().  When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.

This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
  - creating new task
  - renaming a task (exec)
  - set oom_score_adj

To debug, users need to enable some trace pointer. Maybe filtering is useful as

# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable

output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

43d2b113

PM/Hibernate: do not count debug pages as savable · c6968e73

由 Stanislaw Gruszka 提交于 1月 10, 2012

When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
0, we have lot of free pages that are not marked so.  Snapshot code
account them as savable, what cause hibernate memory preallocation
failure.

It is pretty hard to make hibernate allocation succeed with
debug_guardpage_minorder=1.  This change at least make it possible when
system has relatively big amount of RAM.
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c6968e73

09 1月, 2012 1 次提交

audit: always follow va_copy() with va_end() · a0e86bd4

由 Jesper Juhl 提交于 1月 08, 2012

A call to va_copy() should always be followed by a call to va_end() in
the same function.  In kernel/autit.c::audit_log_vformat() this is not
always done.  This patch makes sure va_end() is always called.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a0e86bd4

07 1月, 2012 2 次提交
- A
  vfs: switch ->show_options() to struct dentry * · 34c80b1d
  由 Al Viro 提交于 12月 08, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  34c80b1d
- A
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb · d8c9584e
  由 Al Viro 提交于 12月 07, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d8c9584e
06 1月, 2012 1 次提交

cgroup: fix to allow mounting a hierarchy by name · 0d19ea86

由 Li Zefan 提交于 12月 27, 2011

If we mount a hierarchy with a specified name, the name is unique,
and we can use it to mount the hierarchy without specifying its
set of subsystem names. This feature is documented is
Documentation/cgroups/cgroups.txt section 2.3

Here's an example:

	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
	# mount -t cgroup -o name=myhier xxx /cgroup2

But it was broken by commit 32a8cf23
(cgroup: make the mount options parsing more accurate)

This fixes the regression.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

0d19ea86

05 1月, 2012 3 次提交

PM / Hibernate: Implement compat_ioctl for /dev/snapshot · c336078b

由 Ben Hutchings 提交于 12月 27, 2011

This allows uswsusp built for i386 to run on an x86_64 kernel (tested
with Debian package version 1.0+20110509-2).

References: http://bugs.debian.org/502816Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

c336078b

ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b

由 Oleg Nesterov 提交于 1月 04, 2012

This is the temporary simple fix for 3.2, we need more changes in this
area.

1. do_signal_stop() assumes that the running untraced thread in the
   stopped thread group is not possible. This was our goal but it is
   not yet achieved: a stopped-but-resumed tracee can clone the running
   thread which can initiate another group-stop.

   Remove WARN_ON_ONCE(!current->ptrace).

2. A new thread always starts with ->jobctl = 0. If it is auto-attached
   and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
   but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
   in do_jobctl_trap() if another debugger attaches.

   Change __ptrace_unlink() to set the artificial SIGSTOP for report.

   Alternatively we could change ptrace_init_task() to copy signr from
   current, but this means we can copy it for no reason and hide the
   possible similar problems.
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>		[3.1]
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a88951b

ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race · 50b8d257

由 Oleg Nesterov 提交于 1月 04, 2012

Test-case:

	int main(void)
	{
		int pid, status;

		pid = fork();
		if (!pid) {
			for (;;) {
				if (!fork())
					return 0;
				if (waitpid(-1, &status, 0) < 0) {
					printf("ERR!! wait: %m\n");
					return 0;
				}
			}
		}

		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
		assert(waitpid(-1, NULL, 0) == pid);

		assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
					PTRACE_O_TRACEFORK) == 0);

		do {
			ptrace(PTRACE_CONT, pid, 0, 0);
			pid = waitpid(-1, NULL, 0);
		} while (pid > 0);

		return 1;
	}

It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().

The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.

This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.

I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
Reported-and-tested-by: NDenys Vlasenko <vda.linux@googlemail.com>
Tested-by: NLukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>		[3.0+]
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50b8d257

04 1月, 2012 11 次提交

cgroup: move assignement out of condition in cgroup_attach_proc() · 305f3c8b

由 Dan Carpenter 提交于 1月 04, 2012

Gcc complains about this: "kernel/cgroup.c:2179:4: warning: suggest
parentheses around assignment used as truth value [-Wparentheses]"
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

305f3c8b

A
auditsc: propage umode_t · 93d3a10e
由 Al Viro 提交于 7月 27, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
93d3a10e
A
switch kern_ipc_perm to umode_t · 2570ebbd
由 Al Viro 提交于 7月 27, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2570ebbd
A

switch mq_open() to umode_t · df0a4283
由 Al Viro 提交于 7月 26, 2011

df0a4283
A
sysctl: use umode_t for table permissions · 36fcb589
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
36fcb589
A
cgroup: propagate mode_t · a5e7ed32
由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a5e7ed32
A
switch debugfs to umode_t · f4ae40a6
由 Al Viro 提交于 7月 24, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f4ae40a6

switch vfs_mkdir() and ->mkdir() to umode_t · 18bb1db3

由 Al Viro 提交于 7月 26, 2011

vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

18bb1db3

fs: move code out of buffer.c · ff01bb48

由 Al Viro 提交于 9月 16, 2011

Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff01bb48

get rid of timer in kern/acct.c · 32dc7308

由 Al Viro 提交于 12月 08, 2011

... and clean it up a bit, while we are at it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

32dc7308

hung_task: fix false positive during vfork · f9fab10b

由 Mandeep Singh Baines 提交于 1月 03, 2012

vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Reported-by: NSasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9fab10b

02 1月, 2012 1 次提交

misc latin1 to utf8 conversions · d36b6910

由 Al Viro 提交于 12月 29, 2011

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

d36b6910

01 1月, 2012 1 次提交

futex: Fix uninterruptible loop due to gate_area · e6780f72

由 Hugh Dickins 提交于 12月 31, 2011

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Reported-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6780f72

31 12月, 2011 1 次提交

Revert "clockevents: Set noop handler in clockevents_exchange_device()" · 3b87487a

由 Linus Torvalds 提交于 12月 30, 2011

This reverts commit de28f25e.

It results in resume problems for various people. See for example

  http://thread.gmane.org/gmane.linux.kernel/1233033
  http://thread.gmane.org/gmane.linux.kernel/1233389
  http://thread.gmane.org/gmane.linux.kernel/1233159
  http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

  https://bugzilla.redhat.com/show_bug.cgi?id=767248
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.
Reported-by: NJonathan Nieder <jrnieder@gmail.com>
Reported-by: NPhil Miller <mille121@illinois.edu>
Reported-by: NPhilip Langdale <philipl@overt.org>
Reported-by: NTim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@suse.de>
Cc: stable@kernel.org    # for stable kernels that applied the original
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b87487a

28 12月, 2011 4 次提交

irq: check domain hwirq range for DT translate · 93797d87

由 Rob Herring 提交于 12月 12, 2011

A DT node may have more than 1 domain associated with it, so make sure
the hwirq number is within range when doing DT translation.
Signed-off-by: NRob Herring <rob.herring@calxeda.com>
Acked-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NShawn Guo <shawn.guo@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

93797d87

cgroup: Remove task_lock() from cgroup_post_fork() · 7e3aa30a

由 Frederic Weisbecker 提交于 12月 23, 2011

cgroup_post_fork() is protected between threadgroup_change_begin()
and threadgroup_change_end() against concurrent changes of the
child's css_set in cgroup_task_migrate(). Also the child can't
exit and call cgroup_exit() at this stage, this means it's css_set
can't be changed with init_css_set concurrently.

For these reasons, we don't need to hold task_lock() on the child
because it's css_set can only remain stable in this place.

Let's remove the lock there.

v2: Update comment to explain that we are safe against
cgroup_exit()
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Mandeep Singh Baines <msb@chromium.org>

7e3aa30a

cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end() · c6ca5750

由 Kirill A. Shutemov 提交于 12月 27, 2011

Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

c6ca5750

cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static · 1c6c3fad

由 Kirill A. Shutemov 提交于 12月 27, 2011

Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

1c6c3fad

27 12月, 2011 1 次提交

jump-label: export jump_label_inc/jump_label_dec · a65cf518

由 Xiao Guangrong 提交于 11月 28, 2011

Export these two symbols, they will be used by KVM mmu audit
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a65cf518