提交 · b5740f4b2cb3503b436925eb2242bc3d75cd3dfe · openanolis / cloud-kernel

27 1月, 2012 4 次提交

sched: Fix ancient race in do_exit() · b5740f4b

由 Yasunori Goto 提交于 1月 17, 2012

try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.

Here is the sequence how it occurs:

 ----------------------------------+-----------------------------
                                   |
            CPU A                  |             CPU B
 ----------------------------------+-----------------------------

TASK A calls exit()....

do_exit()

  exit_mm()
    down_read(mm->mmap_sem);

    rwsem_down_failed_common()

      set TASK_UNINTERRUPTIBLE
      set waiter.task <= task A
      list_add to sem->wait_list
           :
      raw_spin_unlock_irq()
      (I/O interruption occured)

                                      __rwsem_do_wake(mmap_sem)

                                        list_del(&waiter->list);
                                        waiter->task = NULL
                                        wake_up_process(task A)
                                          try_to_wake_up()
                                             (task is still
                                               TASK_UNINTERRUPTIBLE)
                                              p->on_rq is still 1.)

                                              ttwu_do_wakeup()
                                                 (*A)
                                                   :
     (I/O interruption handler finished)

      if (!waiter.task)
          schedule() is not called
          due to waiter.task is NULL.

      tsk->state = TASK_RUNNING

          :
                                              check_preempt_curr();
                                                  :
  task->state = TASK_DEAD
                                              (*B)
                                        <---    set TASK_RUNNING (*C)

     schedule()
     (exit task is running again)
     BUG_ON() is called!
 --------------------------------------------------------

The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.

HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....

If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.

By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b5740f4b

sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug · 71325960

由 Suresh Siddha 提交于 1月 19, 2012

With the recent nohz scheduler changes, rq's nohz flag
'NOHZ_TICK_STOPPED' and its associated state doesn't get cleared
immediately after the cpu exits idle. This gets cleared as part
of the next tick seen on that cpu.

For the cpu offline support, we need to clear this state
manually. Fix it by registering a cpu notifier, which clears the
nohz idle load balance state for this rq explicitly during the
CPU_DYING notification.

There won't be any nohz updates for that cpu, after the
CPU_DYING notification. But lets be extra paranoid and skip
updating the nohz state in the select_nohz_load_balancer() if
the cpu is not in active state anymore.
Reported-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-and-tested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327026538.16150.40.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

71325960

sched/s390: Fix compile error in sched/core.c · db7e527d

由 Christian Borntraeger 提交于 1月 11, 2012

Commit 029632fb ("sched: Make
separate sched*.c translation units") removed the include of
asm/mutex.h from sched.c.

This breaks the combination of:

 CONFIG_MUTEX_SPIN_ON_OWNER=yes
 CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX=yes

like s390 without mutex debugging:

  CC      kernel/sched/core.o
  kernel/sched/core.c: In function ‘mutex_spin_on_owner’:
  kernel/sched/core.c:3287: error: implicit declaration of function ‘arch_mutex_cpu_relax’

Lets re-add the include to kernel/sched/core.c
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1326268696-30904-1-git-send-email-borntraeger@de.ibm.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

db7e527d

sched: Fix rq->nr_uninterruptible update race · 4ca9b72b

由 Peter Zijlstra 提交于 1月 25, 2012

KOSAKI Motohiro noticed the following race:

 > CPU0                    CPU1
 > --------------------------------------------------------
 > deactivate_task()
 >                         task->state = TASK_UNINTERRUPTIBLE;
 > activate_task()
 >    rq->nr_uninterruptible--;
 >
 >                         schedule()
 >                           deactivate_task()
 >                             rq->nr_uninterruptible++;
 >

Kosaki-San's scenario is possible when CPU0 runs
__sched_setscheduler() against CPU1's current @task.

__sched_setscheduler() does a dequeue/enqueue in order to move
the task to its new queue (position) to reflect the newly provided
scheduling parameters. However it should be completely invariant to
nr_uninterruptible accounting, sched_setscheduler() doesn't affect
readyness to run, merely policy on when to run.

So convert the inappropriate activate/deactivate_task usage to
enqueue/dequeue_task, which avoids the nr_uninterruptible accounting.

Also convert the two other sites: __migrate_task() and
normalize_task() that still use activate/deactivate_task. These sites
aren't really a problem since __migrate_task() will only be called on
non-running task (and therefore are immume to the described problem)
and normalize_task() isn't ever used on regular systems.

Also remove the comments from activate/deactivate_task since they're
misleading at best.
Reported-by: NKOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>

4ca9b72b

24 1月, 2012 3 次提交

kernel-doc: fix kernel-doc warnings in sched · fa757281

由 Randy Dunlap 提交于 1月 21, 2012

Fix new kernel-doc notation warnings:

Warning(include/linux/sched.h:2094): No description found for parameter 'p'
Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task'
Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri'
Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set'
Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa757281

kernel-doc: fix new warnings in auditsc.c · 42ae610c

由 Randy Dunlap 提交于 1月 21, 2012

Fix new kernel-doc warnings in auditsc.c:

Warning(kernel/auditsc.c:1875): No description found for parameter 'success'
Warning(kernel/auditsc.c:1875): No description found for parameter 'return_code'
Warning(kernel/auditsc.c:1875): Excess function parameter 'pt_regs' description in '__audit_syscall_exit'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

42ae610c

kprobes: initialize before using a hlist · d496aab5

由 Ananth N Mavinakayanahalli 提交于 1月 20, 2012

Commit ef53d9c5 ("kprobes: improve kretprobe scalability with hashed
locking") introduced a bug where we can potentially leak
kretprobe_instances since we initialize a hlist head after having used
it.

Initialize the hlist head before using it.

Reported by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: NJim Keniston <jkenisto@us.ibm.com>
Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srinivasa D S <srinivasa@in.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d496aab5

23 1月, 2012 1 次提交

net: introduce res_counter_charge_nofail() for socket allocations · 0e90b31f

由 Glauber Costa 提交于 1月 20, 2012

There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:

	sk->sk_wmem_queued + size >= sk->sk_sndbuf

The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere
   in memcg, and then deducting from that first, before
   we start draining the res_counter,
2) providing a slightly different allocation function for
   the res_counter, that matches the original behavior of
   the network code more closely.

I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e90b31f

20 1月, 2012 1 次提交

PM / Hibernate: Correct additional pages number calculation · 160cb5a9

由 Namhyung Kim 提交于 1月 19, 2012

The struct bm_block is allocated by chain_alloc(),
so it'd better counting it in LINKED_PAGE_DATA_SIZE.
Signed-off-by: NNamhyung Kim <namhyung.kim@lge.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

160cb5a9

18 1月, 2012 26 次提交

audit: no leading space in audit_log_d_path prefix · c158a35c

由 Kees Cook 提交于 1月 06, 2012

audit_log_d_path() injects an additional space before the prefix,
which serves no purpose and doesn't mix well with other audit_log*()
functions that do not sneak extra characters into the log.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NEric Paris <eparis@redhat.com>

c158a35c

audit: fix signedness bug in audit_log_execve_info() · 5afb8a3f

由 Xi Wang 提交于 12月 20, 2011

In the loop, a size_t "len" is used to hold the return value of
audit_log_single_execve_arg(), which returns -1 on error.  In that
case the error handling (len <= 0) will be bypassed since "len" is
unsigned, and the loop continues with (p += len) being wrapped.
Change the type of "len" to signed int to fix the error handling.

	size_t len;
	...
	for (...) {
		len = audit_log_single_execve_arg(...);
		if (len <= 0)
			break;
		p += len;
	}
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

5afb8a3f

audit: comparison on interprocess fields · 10d68360

由 Peter Moody 提交于 1月 04, 2012

This allows audit to specify rules in which we compare two fields of a
process.  Such as is the running process uid != to the running process
euid?
Signed-off-by: NPeter Moody <pmoody@google.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

10d68360

audit: implement all object interfield comparisons · 4a6633ed

由 Peter Moody 提交于 12月 13, 2011

This completes the matrix of interfield comparisons between uid/gid
information for the current task and the uid/gid information for inodes.
aka I can audit based on differences between the euid of the process and
the uid of fs objects.
Signed-off-by: NPeter Moody <pmoody@google.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

4a6633ed

audit: allow interfield comparison between gid and ogid · c9fe685f

由 Eric Paris 提交于 1月 03, 2012

Allow audit rules to compare the gid of the running task to the gid of the
inode in question.
Signed-off-by: NEric Paris <eparis@redhat.com>

c9fe685f

audit: complex interfield comparison helper · b34b0393

由 Eric Paris 提交于 1月 03, 2012

Rather than code the same loop over and over implement a helper function which
uses some pointer magic to make it generic enough to be used numerous places
as we implement more audit interfield comparisons
Signed-off-by: NEric Paris <eparis@redhat.com>

b34b0393

audit: allow interfield comparison in audit rules · 02d86a56

由 Eric Paris 提交于 1月 03, 2012

We wish to be able to audit when a uid=500 task accesses a file which is
uid=0. Or vice versa. This patch introduces a new audit filter type
AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
should be compared. At this point we only define the task->uid vs
inode->uid, but other comparisons can be added.
Signed-off-by: NEric Paris <eparis@redhat.com>

02d86a56

audit: do not call audit_getname on error · 4043cde8

由 Eric Paris 提交于 1月 03, 2012

Just a code cleanup really.  We don't need to make a function call just for
it to return on error.  This also makes the VFS function even easier to follow
and removes a conditional on a hot path.
Signed-off-by: NEric Paris <eparis@redhat.com>

4043cde8

audit: only allow tasks to set their loginuid if it is -1 · 633b4545

由 Eric Paris 提交于 1月 03, 2012

At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset. We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd. Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.

Systemd has changed how userspace works and allowed us to make the kernel
work the way it should. With systemd users (even admins) are not supposed
to restart services directly. The system will restart the service for
them. Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.

If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used! Since we have old systems I make this a
Kconfig option.
Signed-off-by: NEric Paris <eparis@redhat.com>

633b4545

audit: remove task argument to audit_set_loginuid · 0a300be6

由 Eric Paris 提交于 1月 03, 2012

The function always deals with current.  Don't expose an option
pretending one can use it for something.  You can't.
Signed-off-by: NEric Paris <eparis@redhat.com>

0a300be6

audit: allow audit matching on inode gid · 54d3218b

由 Eric Paris 提交于 1月 03, 2012

Much like the ability to filter audit on the uid of an inode collected, we
should be able to filter on the gid of the inode.
Signed-off-by: NEric Paris <eparis@redhat.com>

54d3218b

audit: allow matching on obj_uid · efaffd6e

由 Eric Paris 提交于 1月 03, 2012

Allow syscall exit filter matching based on the uid of the owner of an
inode used in a syscall.  aka:

auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa
Signed-off-by: NEric Paris <eparis@redhat.com>

efaffd6e

audit: remove audit_finish_fork as it can't be called · 6422e78d

由 Eric Paris 提交于 1月 03, 2012

Audit entry,always rules are not allowed and are automatically changed in
exit,always rules in userspace. The kernel refuses to load such rules.

Thus a task in the middle of a syscall (and thus in audit_finish_fork())
can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
going to actually use the code in audit_finish_fork() since it will
return without doing anything. Thus drop the code.
Signed-off-by: NEric Paris <eparis@redhat.com>

6422e78d

audit: reject entry,always rules · 7ff68e53

由 Eric Paris 提交于 1月 03, 2012

We deprecated entry,always rules a long time ago.  Reject those rules as
invalid.
Signed-off-by: NEric Paris <eparis@redhat.com>

7ff68e53

audit: inline audit_free to simplify the look of generic code · a4ff8dba

由 Eric Paris 提交于 1月 03, 2012

make the conditional a static inline instead of doing it in generic code.
Signed-off-by: NEric Paris <eparis@redhat.com>

a4ff8dba

audit: inline checks for not needing to collect aux records · 07c49417

由 Eric Paris 提交于 1月 03, 2012

A number of audit hooks make function calls before they determine that
auxilary records do not need to be collected. Do those checks as static
inlines since the most common case is going to be that records are not
needed and we can skip the function call overhead.
Signed-off-by: NEric Paris <eparis@redhat.com>

07c49417

audit: drop some potentially inadvisable likely notations · 56179a6e

由 Eric Paris 提交于 1月 03, 2012

The audit code makes heavy use of likely() and unlikely() macros, but they
don't always make sense.  Drop any that seem questionable and let the
computer do it's thing.
Signed-off-by: NEric Paris <eparis@redhat.com>

56179a6e

audit: remove AUDIT_SETUP_CONTEXT as it isn't used · 997f5b64

由 Eric Paris 提交于 1月 03, 2012

Audit contexts have 3 states.  Disabled, which doesn't collect anything,
build, which collects info but might not emit it, and record, which
collects and emits.  There is a 4th state, setup, which isn't used.  Get
rid of it.
Signed-off-by: NEric Paris <eparis@redhat.com>

997f5b64

audit: inline audit_syscall_entry to reduce burden on archs · b05d8447

由 Eric Paris 提交于 1月 03, 2012

Every arch calls:

if (unlikely(current->audit_context))
	audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.
Signed-off-by: NEric Paris <eparis@redhat.com>

b05d8447

Audit: push audit success and retcode into arch ptrace.h · d7e7528b

由 Eric Paris 提交于 1月 03, 2012

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]

d7e7528b

seccomp: audit abnormal end to a process due to seccomp · 85e7bac3

由 Eric Paris 提交于 1月 03, 2012

The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.
Signed-off-by: NEric Paris <eparis@redhat.com>

85e7bac3

audit: check current inode and containing object when filtering on major and minor · 16c174bd

由 Eric Paris 提交于 1月 03, 2012

The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon.  Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
say we add a watch with a filter on 8,1.  If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.

Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents.  We might hope that
this would also be logged, but it isn't.  The rules will check the
major,minor of the device containing /dev/sda1.  In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.

I believe these rules should trigger on either device.  The man page is
devoid of useful information about the intended semantics.  It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...
Signed-off-by: NEric Paris <eparis@redhat.com>

16c174bd

audit: drop the meaningless and format breaking word 'user' · 3035c51e

由 Eric Paris 提交于 1月 03, 2012

userspace audit messages look like so:

type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''

That third field just says 'user'. That's useless and doesn't follow the
key=value pair we are trying to enforce. We already know it came from the
user based on the record type. Kill that word. Die.
Signed-off-by: NEric Paris <eparis@redhat.com>

3035c51e

audit: dynamically allocate audit_names when not enough space is in the names array · 5195d8e2

由 Eric Paris 提交于 1月 03, 2012

This patch does 2 things.  First it reduces the number of audit_names
allocated in every audit context from 20 to 5.  5 should be enough for all
'normal' syscalls (rename being the worst).  Some syscalls can still touch
more the 5 inodes such as mount.  When rpc filesystem is mounted it will
create inodes and those can exceed 5.  To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5.  This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.
Signed-off-by: NEric Paris <eparis@redhat.com>

5195d8e2

audit: make filetype matching consistent with other filters · 5ef30ee5

由 Eric Paris 提交于 1月 03, 2012

Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list. The filetype matching
however had a strange way of doing things. It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel. Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea. As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field. The kernel always checked the first
name collected which for the tested rules was always correct.

This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected. It
also changes the rule validation to reject the old unused rule types.

Noone knew it was there. Noone used it. Why keep around the extra code?
Signed-off-by: NEric Paris <eparis@redhat.com>

5ef30ee5

Revert "capabitlies: ns_capable can use the cap helpers rather than lsm call" · 951880e6

由 Linus Torvalds 提交于 1月 17, 2012

This reverts commit d2a7009f.

J. R. Okajima explains:

 "After this commit, I am afraid access(2) on NFS may not work
  correctly.  The scenario based upon my guess.
   - access(2) overrides the credentials.
   - calls inode_permission() -- ... -- generic_permission() --
      ns_capable().
   - while the old ns_capable() calls security_capable(current_cred()),
     the new ns_capable() calls has_ns_capability(current) --
     security_capable(__task_cred(t)).

  current_cred() returns current->cred which is effective (overridden)
  credentials, but __task_cred(current) returns current->real_cred (the
  NFSD's credential).  And the overridden credentials by access(2) lost."
Requested-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

951880e6

17 1月, 2012 1 次提交

tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT · c10076c4

由 Steven Rostedt 提交于 1月 13, 2012

Tracepoints are disabled for tainted modules, which is usually because the
module is either proprietary or was forced, and we don't want either of them
using kernel tracepoints.

But, a module can also be tainted by being in the staging directory or
compiled out of tree. Either is fine for use with tracepoints, no need
to punish them.  I found this out when I noticed that my sample trace event
module, when done out of tree, stopped working.

Cc: stable@vger.kernel.org # 3.2
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c10076c4

16 1月, 2012 1 次提交

error: implicit declaration of function 'module_flags_taint' · 53999bf3

由 Kevin Winchester 提交于 1月 15, 2012

Recent changes to kernel/module.c caused the following compile
error:

  kernel/module.c: In function ‘show_taint’:
  kernel/module.c:1024:2: error: implicit declaration of function ‘module_flags_taint’ [-Werror=implicit-function-declaration]
  cc1: some warnings being treated as errors

Correct this error by moving the definition of module_flags_taint
outside of the #ifdef CONFIG_MODULE_UNLOAD section.
Signed-off-by: NKevin Winchester <kjwinchester@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

53999bf3

14 1月, 2012 2 次提交

PM / Hibernate: Drop the check of swap space size for compressed image · ee34a370

由 Barry Song 提交于 1月 09, 2012

For compressed image, the space required is not known until
we finish compressing and writing all pages.
This patch drops the check, and if swap space is not enough
finally, system can still restore to normal after writing
swap fails for compressed images.
Signed-off-by: NBarry Song <Baohua.Song@csr.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

ee34a370

PM: Make sysrq-o be available for CONFIG_PM unset · dae5cbc2

由 Rafael J. Wysocki 提交于 1月 14, 2012

After commit 1eb208ae, "PM: Make
CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)", the
files under kernel/power are not built unless CONFIG_PM_SLEEP or
CONFIG_PM_RUNTIME is set.  In particular, this causes
kernel/power/poweroff.c to be omitted, even though it should be
compiled, because CONFIG_MAGIC_SYSRQ is set.

Fix the problem by causing kernel/power/Makefile to be processed
for CONFIG_PM unset too.
Reported-and-tested-by: NPhil Oester <kernel@linuxace.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

dae5cbc2

13 1月, 2012 1 次提交

c/r: prctl: add PR_SET_MM codes to set up mm_struct entries · 028ee4be

由 Cyrill Gorcunov 提交于 1月 12, 2012

When we restore a task we need to set up text, data and data heap sizes
from userspace to the values a task had at checkpoint time.  This patch
adds auxilary prctl codes for that.

While most of them have a statistical nature (their values are involved
into calculation of /proc/<pid>/statm output) the start_brk and brk values
are used to compute an allowed size of program data segment expansion.
Which means an arbitrary changes of this values might be dangerous
operation.  So to restrict access the following requirements applied to
prctl calls:

 - The process has to have CAP_SYS_ADMIN capability granted.
 - For all opcodes except start_brk/brk members an appropriate
   VMA area must exist and should fit certain VMA flags,
   such as:
   - code segment must be executable but not writable;
   - data segment must not be executable.

start_brk/brk values must not intersect with data segment and must not
exceed RLIMIT_DATA resource limit.

Still the main guard is CAP_SYS_ADMIN capability check.

Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support
otherwise these prctl calls will return -EINVAL.

[akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text]
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

028ee4be

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功