提交 · d865e573b8a4f30fbb74fa7666ca81e3132eb547 · openeuler / raspberrypi-kernel

13 1月, 2016 5 次提交

audit: Delete unnecessary checks before two function calls · d865e573

由 Markus Elfring 提交于 1月 13, 2016

The functions consume_skb() and kfree_skb() test whether their argument
is NULL and then return immediately.
Thus the tests around their calls are not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
[PM: tweak patch prefix]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

d865e573

audit: wake up threads if queue switched from limited to unlimited · 1194b994

由 Richard Guy Briggs 提交于 1月 13, 2016

If the audit_backlog_limit is changed from a limited value to an
unlimited value (zero) while the queue was overflowed, wake up the
audit_backlog_wait queue to allow those processes to continue.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

1194b994

audit: include auditd's threads in audit_log_start() wait exception · f48a9429

由 Richard Guy Briggs 提交于 1月 13, 2016

Should auditd spawn threads, allow all members of its thread group to
use the audit_backlog_limit reserves to bypass the queue limits too.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: minor upstream merge tweaks]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

f48a9429

audit: remove audit_backlog_wait_overflow · eb8baf6a

由 Paul Moore 提交于 1月 13, 2016

It seems much more obvious and readable to simply use "0".
Signed-off-by: NPaul Moore <pmoore@redhat.com>

eb8baf6a

audit: don't needlessly reset valid wait time · c4b7a775

由 Richard Guy Briggs 提交于 1月 13, 2016

After auditd has recovered from an overflowed queue, the first process
that doesn't use reserves to make it through the queue checks should
reset the audit backlog wait time to the configured value.  After that,
there is no need to keep resetting it.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

c4b7a775

07 11月, 2015 1 次提交

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc

由 Mel Gorman 提交于 11月 06, 2015

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0164adc

04 11月, 2015 4 次提交

audit: make audit_log_common_recv_msg() a void function · 233a6866

由 Paul Moore 提交于 11月 04, 2015

It always returns zero and no one is checking the return value.
Signed-off-by: NPaul Moore <pmoore@redhat.com>

233a6866

audit: removing unused variable · c5ea6efd

由 Saurabh Sengar 提交于 11月 04, 2015

Variable rc in not required as it is just used for unchanged for return,
and return is always 0 in the function.
Signed-off-by: NSaurabh Sengar <saurabh.truth@gmail.com>
[PM: fixed spelling errors in description]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

c5ea6efd

audit: audit_string_contains_control can be boolean · 9fcf836b

由 Yaowei Bai 提交于 11月 04, 2015

This patch makes audit_string_contains_control return bool to improve
readability due to this particular function only using either one or
zero as its return value.
Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

9fcf836b

audit: try harder to send to auditd upon netlink failure · 32a1dbae

由 Richard Guy Briggs 提交于 11月 04, 2015

There are several reports of the kernel losing contact with auditd when
it is, in fact, still running.  When this happens, kernel syslogs show:
	"audit: *NO* daemon at audit_pid=<pid>"
although auditd is still running, and is apparently happy, listening on
the netlink socket. The pid in the "*NO* daemon" message matches the pid
of the running auditd process.  Restarting auditd solves this.

The problem appears to happen randomly, and doesn't seem to be strongly
correlated to the rate of audit events being logged.  The problem
happens fairly regularly (every few days), but not yet reproduced to
order.

On production kernels, BUG_ON() is a no-op, so any error will trigger
this.

Commit 34eab0a7 ("audit: prevent an older auditd shutdown from
orphaning a newer auditd startup") eliminates one possible cause.  This
isn't the case here, since the PID in the error message and the PID of
the running auditd match.

The primary expected cause of error here is -ECONNREFUSED when the audit
daemon goes away, when netlink_getsockbyportid() can't find the auditd
portid entry in the netlink audit table (or there is no receive
function).  If -EPERM is returned, that situation isn't likely to be
resolved in a timely fashion without administrator intervention.  In
both cases, reset the audit_pid.  This does not rule out a race
condition.  SELinux is expected to return zero since this isn't an INET
or INET6 socket.  Other LSMs may have other return codes.  Log the error
code for better diagnosis in the future.

In the case of -ENOMEM, the situation could be temporary, based on local
or general availability of buffers.  -EAGAIN should never happen since
the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
-ERESTARTSYS and -EINTR are not expected since this kernel thread is not
expected to receive signals.  In these cases (or any other unexpected
ones for now), report the error and re-schedule the thread, retrying up
to 5 times.

v2:
	Removed BUG_ON().
	Moved comma in pr_*() statements.
	Removed audit_strerror() text.
Reported-by: NVipin Rathor <v.rathor@gmail.com>
Reported-by: <ctcard@hotmail.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: applied rgb's fixup patch to correct audit_log_lost() format issues]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

32a1dbae

07 8月, 2015 1 次提交

audit: use macros for unset inode and device values · 84cb777e

由 Richard Guy Briggs 提交于 8月 05, 2015

Clean up a number of places were casted magic numbers are used to represent
unset inode and device numbers in preparation for the audit by executable path
patch set.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: enclosed the _UNSET macros in parentheses for ./scripts/checkpatch]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

84cb777e

30 5月, 2015 1 次提交

audit: fix for typo in comment to function audit_log_link_denied() · 22011964

由 Shailendra Verma 提交于 5月 23, 2015

Signed-off-by: NShailendra Verma <shailendra.capricorn@gmail.com>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

22011964

16 4月, 2015 1 次提交

VFS: audit: d_backing_inode() annotations · 3b362157

由 David Howells 提交于 3月 17, 2015

Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3b362157

14 3月, 2015 1 次提交

audit: Remove condition which always evaluates to false · 724e7bfc

由 Pranith Kumar 提交于 3月 11, 2015

After commit 3e1d0bb6 ("audit: Convert int limit
uses to u32"), by converting an int to u32, few conditions will always evaluate
to false.

These warnings were emitted during compilation:

kernel/audit.c: In function ‘audit_set_enabled’:
kernel/audit.c:347:2: warning: comparison of unsigned expression < 0 is always
false [-Wtype-limits]
  if (state < AUDIT_OFF || state > AUDIT_LOCKED)
	  ^
	  kernel/audit.c: In function ‘audit_receive_msg’:
	  kernel/audit.c:880:9: warning: comparison of unsigned expression < 0 is
	  always false [-Wtype-limits]
	      if (s.backlog_wait_time < 0 ||

The following patch removes those unnecessary conditions.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

724e7bfc

24 2月, 2015 5 次提交

audit: reduce mmap_sem hold for mm->exe_file · 5b282552

由 Davidlohr Bueso 提交于 2月 22, 2015

The mm->exe_file is currently serialized with mmap_sem (shared)
in order to both safely (1) read the file and (2) audit it via
audit_log_d_path(). Good users will, on the other hand, make use
of the more standard get_mm_exe_file(), requiring only holding
the mmap_sem to read the value, and relying on reference counting
to make sure that the exe file won't dissapear underneath us.

Additionally, upon NULL return of get_mm_exe_file, we also call
audit_log_format(ab, " exe=(null)").
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

5b282552

audit: consolidate handling of mm->exe_file · 4766b199

由 Davidlohr Bueso 提交于 2月 22, 2015

This patch adds a audit_log_d_path_exe() helper function
to share how we handle auditing of the exe_file's path.
Used by both audit and auditsc. No functionality is changed.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

4766b199

audit: code clean up · 5985de67

由 Ameen Ali 提交于 2月 23, 2015

Fixed a coding style issue (unnecessary parentheses , unnecessary braces)
Signed-off-by: NAmeen-Ali <Ameenali023@gmail.com>
[PM: tweaked subject line]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

5985de67

audit: don't reset working wait time accidentally with auditd · efef73a1

由 Richard Guy Briggs 提交于 2月 23, 2015

During a queue overflow condition while we are waiting for auditd to drain the
queue to make room for regular messages, we don't want a successful auditd that
has bypassed the queue check to reset the backlog wait time.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

efef73a1

audit: don't lose set wait time on first successful call to audit_log_start() · a77ed4e5

由 Richard Guy Briggs 提交于 2月 23, 2015

Copy the set wait time to a working value to avoid losing the set
value if the queue overflows.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

a77ed4e5

27 12月, 2014 1 次提交

netlink/genetlink: pass network namespace to bind/unbind · 023e2cfa

由 Johannes Berg 提交于 12月 23, 2014

Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

023e2cfa

20 12月, 2014 1 次提交

audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb · 54dc77d9

由 Richard Guy Briggs 提交于 12月 18, 2014

Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
audit_log_end(), which can come from any context (aka even a sleeping context)
GFP_KERNEL can't be used.  Since the audit_buffer knows what context it should
use, pass that down and use that.

See: https://lkml.org/lkml/2014/12/16/542

BUG: sleeping function called from invalid context at mm/slab.c:2849
in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
2 locks held by sulogin/885:
  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
  #1:  (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
  ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
  ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
  0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
Call Trace:
  [<ffffffff916ba529>] dump_stack+0x50/0xa8
  [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
  [<ffffffff910632a6>] __might_sleep+0x119/0x128
  [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
  [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
  [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
  [<ffffffff914e2b62>] skb_copy+0x3e/0xa3
  [<ffffffff910c263e>] audit_log_end+0x83/0x100
  [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
  [<ffffffff91252a73>] common_lsm_audit+0x441/0x450
  [<ffffffff9123c163>] slow_avc_audit+0x63/0x67
  [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
  [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
  [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
  [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
  [<ffffffff911515e6>] install_exec_creds+0xe/0x79
  [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
  [<ffffffff9115198e>] search_binary_handler+0x81/0x18c
  [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
  [<ffffffff91153669>] do_execve+0x1f/0x21
  [<ffffffff91153967>] SyS_execve+0x25/0x29
  [<ffffffff916c61a9>] stub_execve+0x69/0xa0

Cc: stable@vger.kernel.org #v3.16-rc1
Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Tested-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NPaul Moore <pmoore@redhat.com>

54dc77d9

18 11月, 2014 1 次提交

audit: convert status version to a feature bitmap · 0288d718

由 Richard Guy Briggs 提交于 11月 17, 2014

The version field defined in the audit status structure was found to have
limitations in terms of its expressibility of features supported.  This is
distict from the get/set features call to be able to command those features
that are present.

Converting this field from a version number to a feature bitmap will allow
distributions to selectively backport and support certain features and will
allow upstream to be able to deprecate features in the future.  It will allow
userspace clients to first query the kernel for which features are actually
present and supported.  Currently, EINVAL is returned rather than EOPNOTSUP,
which isn't helpful in determining if there was an error in the command, or if
it simply isn't supported yet.  Past features are not represented by this
bitmap, but their use may be converted to EOPNOTSUP if needed in the future.

Since "version" is too generic to convert with a #define, use a union in the
struct status, introducing the member "feature_bitmap" unionized with
"version".

Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP*
counterparts, leaving the former for backwards compatibility.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: minor whitespace tweaks]
Signed-off-by: NPaul Moore <pmoore@redhat.com>

0288d718

04 11月, 2014 1 次提交

audit, sched/wait: Fixup kauditd_thread() wait loop · 6b55fc63

由 Peter Zijlstra 提交于 10月 02, 2014

The kauditd_thread wait loop is a bit iffy; it has a number of problems:

 - calls try_to_freeze() before schedule(); you typically want the
   thread to re-evaluate the sleep condition when unfreezing, also
   freeze_task() issues a wakeup.

 - it unconditionally does the {add,remove}_wait_queue(), even when the
   sleep condition is false.

Use wait_event_freezable() that does the right thing.
Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: oleg@redhat.com
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

6b55fc63

31 10月, 2014 1 次提交

audit: AUDIT_FEATURE_CHANGE message format missing delimiting space · 897f1acb

由 Richard Guy Briggs 提交于 10月 30, 2014

Add a space between subj= and feature= fields to make them parsable.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>

897f1acb

24 9月, 2014 7 次提交

audit: get comm using lock to avoid race in string printing · 9eab339b

由 Richard Guy Briggs 提交于 3月 15, 2014

When task->comm is passed directly to audit_log_untrustedstring() without
getting a copy or using the task_lock, there is a race that could happen that
would output a NULL (\0) in the output string that would effectively truncate
the rest of the report text after the comm= field in the audit, losing fields.

Use get_task_comm() to get a copy while acquiring the task_lock to prevent
this and to prevent the result from being a mixture of old and new values of
comm.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

9eab339b

audit: correct AUDIT_GET_FEATURE return message type · 9ef91514

由 Richard Guy Briggs 提交于 8月 24, 2014

When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature.  The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.

This appears to have been a cut-and-paste-eo in commit b0fed402.
Reported-by: NSteve Grubb <sgrubb@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

9ef91514

audit: set nlmsg_len for multicast messages. · 54e05edd

由 Richard Guy Briggs 提交于 8月 21, 2014

Report:
	Looking at your example code in
	http://people.redhat.com/rbriggs/audit-multicast-listen/audit-multicast-listen.c,
	it seems that nlmsg_len field in the received messages is supposed to
	contain the length of the header + payload, but it is always set to the
	size of the header only, i.e. 16. The example program works, because
	the printf format specifies the minimum width, not "precision", so it
	simply prints out the payload until the first zero byte. This isn't too
	much of a problem, but precludes the use of recvmmsg, iiuc?

	(gdb) p *(struct nlmsghdr*)nlh
	$14 = {nlmsg_len = 16, nlmsg_type = 1100, nlmsg_flags = 0, nlmsg_seq = 0, nlmsg_pid = 9910}

The only time nlmsg_len would have been updated was at audit_buffer_alloc()
inside audit_log_start() and never updated after.  It should arguably be done
in audit_log_vformat(), but would be more efficient in audit_log_end().
Reported-by: NZbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

54e05edd

audit: use atomic_t to simplify audit_serial() · 01478d7d

由 Richard Guy Briggs 提交于 6月 13, 2014

Since there is already a primitive to do this operation in the atomic_t, use it
to simplify audit_serial().
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

01478d7d

kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0] · 6eed9b26

由 Fabian Frederick 提交于 6月 03, 2014

Use kernel.h definition.

Cc: Eric Paris <eparis@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

6eed9b26

audit: reduce scope of audit_log_fcaps · 691e6d59

由 Richard Guy Briggs 提交于 5月 26, 2014

audit_log_fcaps() isn't used outside kernel/audit.c.  Reduce its scope.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

691e6d59

audit: reduce scope of audit_net_id · c0a8d9b0

由 Richard Guy Briggs 提交于 5月 26, 2014

audit_net_id isn't used outside kernel/audit.c.  Reduce its scope.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>

c0a8d9b0

24 7月, 2014 1 次提交

CAPABILITIES: remove undefined caps from all processes · 7d8b6c63

由 Eric Paris 提交于 7月 23, 2014

This is effectively a revert of 7b9a7ec5
plus fixing it a different way...

We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits.  This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.

Consider a root application which drops all capabilities from ALL 4
capability sets.  We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.

The BSET gets cleared differently.  Instead it is cleared one bit at a
time.  The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read.  So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.

So the 'parent' will look something like:
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffc000000000

All of this 'should' be fine.  Given that these are undefined bits that
aren't supposed to have anything to do with permissions.  But they do...

So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel).  We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets.  If that root task calls execve()
the child task will pick up all caps not blocked by the bset.  The bset
however does not block bits higher than CAP_LAST_CAP.  So now the child
task has bits in eff which are not in the parent.  These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.

The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits!  So now we set durring commit creds that
the child is not dumpable.  Given it is 'more priv' than its parent.  It
also means the parent cannot ptrace the child and other stupidity.

The solution here:
1) stop hiding capability bits in status
	This makes debugging easier!

2) stop giving any task undefined capability bits.  it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.
	This fixes the cap_issubset() tests and resulting fallout (which
	made the init task in a docker container untraceable among other
	things)

3) mask out undefined bits when sys_capset() is called as it might use
~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
	This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.

4) mask out undefined bit when we read a file capability off of disk as
again likely all bits are set in the xattr for forward/backward
compatibility.
	This lets 'setcap all+pe /bin/bash; /bin/bash' run
Signed-off-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Dan Walsh <dwalsh@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

7d8b6c63

07 6月, 2014 1 次提交

ipc, kernel: use Linux headers · 7153e402

由 Paul McQuade 提交于 6月 06, 2014

Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>
Signed-off-by: NPaul McQuade <paulmcquad@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7153e402

25 4月, 2014 1 次提交

net: Use netlink_ns_capable to verify the permisions of netlink messages · 90f62cf3

由 Eric W. Biederman 提交于 4月 23, 2014

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90f62cf3

23 4月, 2014 3 次提交

audit: send multicast messages only if there are listeners · 7f74ecd7

由 Richard Guy Briggs 提交于 4月 22, 2014

Test first to see if there are any userspace multicast listeners bound to the
socket before starting the multicast send work.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f74ecd7

audit: add netlink multicast group for log read · 451f9216

由 Richard Guy Briggs 提交于 4月 22, 2014

Add a netlink multicast socket with one group to kaudit for "best-effort"
delivery to read-only userspace clients such as systemd, in addition to the
existing bidirectional unicast auditd userspace client.

Currently, auditd is intended to use the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
capabilities, but actually uses CAP_NET_ADMIN. The CAP_AUDIT_READ capability
is added for use by read-only AUDIT_NLGRP_READLOG netlink multicast group
clients to the kaudit subsystem.

This will safely give access to services such as systemd to consume audit logs
while ensuring write access remains restricted for integrity.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

451f9216

audit: add netlink audit protocol bind to check capabilities on multicast join · 3a101b8d

由 Richard Guy Briggs 提交于 4月 22, 2014

Register a netlink per-protocol bind fuction for audit to check userspace
process capabilities before allowing a multicast group connection.
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a101b8d

01 4月, 2014 1 次提交

AUDIT: Allow login in non-init namespaces · 543bc6a1

由 Eric Paris 提交于 3月 30, 2014

It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent.  This is common in
many distros and thus normal configuration of many containers.  The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel.  If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace.  So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out.  Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!

This fix is kinda hacky.  We return ECONNREFUSED for all non-init
relevant namespaces.  That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'.  They don't really work, since audit
isn't logging things.  But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world.  This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...
Reported-by: NAndre Tomt <andre@tomt.net>
Reported-by: NAdam Richter <adam_richter2004@yahoo.com>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	kernel/audit.c

543bc6a1

31 3月, 2014 1 次提交

AUDIT: Allow login in non-init namespaces · aa4af831

由 Eric Paris 提交于 3月 30, 2014

It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent. This is common in
many distros and thus normal configuration of many containers. The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel. If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace. So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM). Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out. Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!

This fix is kinda hacky. We return ECONNREFUSED for all non-init
relevant namespaces. That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'. They don't really work, since audit
isn't logging things. But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world. This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...
Reported-by: NAndre Tomt <andre@tomt.net>
Reported-by: NAdam Richter <adam_richter2004@yahoo.com>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa4af831

25 3月, 2014 1 次提交

kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c · e231d54c

由 Monam Agarwal 提交于 3月 24, 2014

This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)
Signed-off-by: NMonam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

e231d54c