提交 · 791ec491c372f49cea3ea7a7143454a9023ac9d4 · openeuler / raspberrypi-kernel

06 3月, 2017 1 次提交

prlimit,security,selinux: add a security hook for prlimit · 791ec491

由 Stephen Smalley 提交于 2月 17, 2017

When SELinux was first added to the kernel, a process could only get
and set its own resource limits via getrlimit(2) and setrlimit(2), so no
MAC checks were required for those operations, and thus no security hooks
were defined for them. Later, SELinux introduced a hook for setlimit(2)
with a check if the hard limit was being changed in order to be able to
rely on the hard limit value as a safe reset point upon context
transitions.

Later on, when prlimit(2) was added to the kernel with the ability to get
or set resource limits (hard or soft) of another process, LSM/SELinux was
not updated other than to pass the target process to the setrlimit hook.
This resulted in incomplete control over both getting and setting the
resource limits of another process.

Add a new security_task_prlimit() hook to the check_prlimit_permission()
function to provide complete mediation. The hook is only called when
acting on another task, and only if the existing DAC/capability checks
would allow access. Pass flags down to the hook to indicate whether the
prlimit(2) call will read, write, or both read and write the resource
limits of the target process.

The existing security_task_setrlimit() hook is left alone; it continues
to serve a purpose in supporting the ability to make decisions based on
the old and/or new resource limit values when setting limits. This
is consistent with the DAC/capability logic, where
check_prlimit_permission() performs generic DAC/capability checks for
acting on another task, while do_prlimit() performs a capability check
based on a comparison of the old and new resource limits. Fix the
inline documentation for the hook to match the code.

Implement the new hook for SELinux. For setting resource limits, we
reuse the existing setrlimit permission. Note that this does overload
the setrlimit permission to mean the ability to set the resource limit
(soft or hard) of another process or the ability to change one's own
hard limit. For getting resource limits, a new getrlimit permission
is defined. This was not originally defined since getrlimit(2) could
only be used to obtain a process' own limits.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

791ec491

02 3月, 2017 3 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> · 29930025

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

29930025

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3f07c014

selinux: wrap cgroup seclabel support with its own policy capability · 2651225b

由 Stephen Smalley 提交于 2月 28, 2017

commit 1ea0ce40 ("selinux: allow
changing labels for cgroupfs") broke the Android init program,
which looks up security contexts whenever creating directories
and attempts to assign them via setfscreatecon().
When creating subdirectories in cgroup mounts, this would previously
be ignored since cgroup did not support userspace setting of security
contexts.  However, after the commit, SELinux would attempt to honor
the requested context on cgroup directories and fail due to permission
denial.  Avoid breaking existing userspace/policy by wrapping this change
with a conditional on a new cgroup_seclabel policy capability.  This
preserves existing behavior until/unless a new policy explicitly enables
this capability.
Reported-by: NJohn Stultz <john.stultz@linaro.org>
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

2651225b

28 2月, 2017 1 次提交

lib/vsprintf.c: remove %Z support · 5b5e0928

由 Alexey Dobriyan 提交于 2月 27, 2017

Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b5e0928

25 2月, 2017 1 次提交

mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf · 11bac800

由 Dave Jiang 提交于 2月 24, 2017

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11bac800

08 2月, 2017 3 次提交

selinux: fix off-by-one in setprocattr · 0c461cb7

由 Stephen Smalley 提交于 1月 31, 2017

SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute.  However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb ("proc_pid_attr_write():
switch to memdup_user()").  Fix the off-by-one error.

Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate

Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.

There are no users of this facility to my knowledge; possibly we
should just get rid of it.

UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug.  This patch fixes CVE-2017-2618.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Cc: stable@vger.kernel.org # 3.5: d6ea83ecSigned-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

0c461cb7

selinux: allow changing labels for cgroupfs · 1ea0ce40

由 Antonio Murdaca 提交于 2月 02, 2017

This patch allows changing labels for cgroup mounts. Previously, running
chcon on cgroupfs would throw an "Operation not supported". This patch
specifically whitelist cgroupfs.

The patch could also allow containers to write only to the systemd cgroup
for instance, while the other cgroups are kept with cgroup_t label.
Signed-off-by: NAntonio Murdaca <runcom@redhat.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

1ea0ce40

selinux: fix off-by-one in setprocattr · a050a570

由 Stephen Smalley 提交于 1月 31, 2017

SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute.  However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb ("proc_pid_attr_write():
switch to memdup_user()").  Fix the off-by-one error.

Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate

Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.

There are no users of this facility to my knowledge; possibly we
should just get rid of it.

UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug.  This patch fixes CVE-2017-2618.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Cc: stable@vger.kernel.org # 3.5: d6ea83ecSigned-off-by: NPaul Moore <paul@paul-moore.com>

a050a570

25 1月, 2017 1 次提交

Introduce a sysctl that modifies the value of PROT_SOCK. · 4548b683

由 Krister Johansen 提交于 1月 20, 2017

Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace. To
disable all privileged ports set this to zero. It also checks for
overlap with the local port range. The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration. The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace. This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.
Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4548b683

24 1月, 2017 1 次提交

exec: Remove LSM_UNSAFE_PTRACE_CAP · 9227dd2a

由 Eric W. Biederman 提交于 1月 23, 2017

With previous changes every location that tests for
LSM_UNSAFE_PTRACE_CAP also tests for LSM_UNSAFE_PTRACE making the
LSM_UNSAFE_PTRACE_CAP redundant, so remove it.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

9227dd2a

19 1月, 2017 1 次提交

LSM: Add /sys/kernel/security/lsm · d69dece5

由 Casey Schaufler 提交于 1月 18, 2017

I am still tired of having to find indirect ways to determine
what security modules are active on a system. I have added
/sys/kernel/security/lsm, which contains a comma separated
list of the active security modules. No more groping around
in /proc/filesystems or other clever hacks.

Unchanged from previous versions except for being updated
to the latest security next branch.
Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
Acked-by: NJohn Johansen <john.johansen@canonical.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

d69dece5

13 1月, 2017 2 次提交

security,selinux,smack: kill security_task_wait hook · 3a2f5a59

由 Stephen Smalley 提交于 1月 10, 2017

As reported by yangshukui, a permission denial from security_task_wait()
can lead to a soft lockup in zap_pid_ns_processes() since it only expects
sys_wait4() to return 0 or -ECHILD. Further, security_task_wait() can
in general lead to zombies; in the absence of some way to automatically
reparent a child process upon a denial, the hook is not useful.  Remove
the security hook and its implementations in SELinux and Smack.  Smack
already removed its check from its hook.
Reported-by: Nyangshukui <yangshukui@huawei.com>
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3a2f5a59

selinux: drop unused socket security classes · b4ba35c7

由 Stephen Smalley 提交于 1月 11, 2017

Several of the extended socket classes introduced by
commit da69a530 ("selinux: support distinctions
among all network address families") are never used because
sockets can never be created with the associated address family.
Remove these unused socket security classes.  The removed classes
are bridge_socket for PF_BRIDGE, ib_socket for PF_IB, and mpls_socket
for PF_MPLS.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

b4ba35c7

09 1月, 2017 8 次提交

selinux: default to security isid in sel_make_bools() if no sid is found · 900fde06

由 Gary Tierney 提交于 1月 09, 2017

Use SECINITSID_SECURITY as the default SID for booleans which don't have
a matching SID returned from security_genfs_sid(), also update the
error message to a warning which matches this.

This prevents the policy failing to load (and consequently the system
failing to boot) when there is no default genfscon statement matched for
the selinuxfs in the new policy.
Signed-off-by: NGary Tierney <gary.tierney@gmx.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

900fde06

selinux: log errors when loading new policy · 4262fb51

由 Gary Tierney 提交于 1月 09, 2017

Adds error logging to the code paths which can fail when loading a new
policy in sel_write_load().  If the policy fails to be loaded from
userspace then a warning message is printed, whereas if a failure occurs
after loading policy from userspace an error message will be printed
with details on where policy loading failed (recreating one of /classes/,
/policy_capabilities/, /booleans/ in the SELinux fs).

Also, if sel_make_bools() fails to obtain an SID for an entry in
/booleans/* an error will be printed indicating the path of the
boolean.
Signed-off-by: NGary Tierney <gary.tierney@gmx.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

4262fb51

proc,security: move restriction on writing /proc/pid/attr nodes to proc · b21507e2

由 Stephen Smalley 提交于 1月 09, 2017

Processes can only alter their own security attributes via
/proc/pid/attr nodes.  This is presently enforced by each individual
security module and is also imposed by the Linux credentials
implementation, which only allows a task to alter its own credentials.
Move the check enforcing this restriction from the individual
security modules to proc_pid_attr_write() before calling the security hook,
and drop the unnecessary task argument to the security hook since it can
only ever be the current task.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Acked-by: NJohn Johansen <john.johansen@canonical.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

b21507e2

selinux: clean up cred usage and simplify · be0554c9

由 Stephen Smalley 提交于 1月 09, 2017

SELinux was sometimes using the task "objective" credentials when
it could/should use the "subjective" credentials.  This was sometimes
hidden by the fact that we were unnecessarily passing around pointers
to the current task, making it appear as if the task could be something
other than current, so eliminate all such passing of current.  Inline
various permission checking helper functions that can be reduced to a
single avc_has_perm() call.

Since the credentials infrastructure only allows a task to alter
its own credentials, we can always assume that current must be the same
as the target task in selinux_setprocattr after the check. We likely
should move this check from selinux_setprocattr() to proc_pid_attr_write()
and drop the task argument to the security hook altogether; it can only
serve to confuse things.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

be0554c9

selinux: allow context mounts on tmpfs, ramfs, devpts within user namespaces · 01593d32

由 Stephen Smalley 提交于 1月 09, 2017

commit aad82892 ("selinux: Add support for
unprivileged mounts from user namespaces") prohibited any use of context
mount options within non-init user namespaces.  However, this breaks
use of context mount options for tmpfs mounts within user namespaces,
which are being used by Docker/runc.  There is no reason to block such
usage for tmpfs, ramfs or devpts.  Exempt these filesystem types
from this restriction.

Before:
sh$ userns_child_exec  -p -m -U -M '0 1000 1' -G '0 1000 1' bash
sh# mount -t tmpfs -o context=system_u:object_r:user_tmp_t:s0:c13 none /tmp
mount: tmpfs is write-protected, mounting read-only
mount: cannot mount tmpfs read-only

After:
sh$ userns_child_exec  -p -m -U -M '0 1000 1' -G '0 1000 1' bash
sh# mount -t tmpfs -o context=system_u:object_r:user_tmp_t:s0:c13 none /tmp
sh# ls -Zd /tmp
unconfined_u:object_r:user_tmp_t:s0:c13 /tmp
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

01593d32

selinux: handle ICMPv6 consistently with ICMP · ef37979a

由 Stephen Smalley 提交于 1月 09, 2017

commit 79c8b348f215 ("selinux: support distinctions among all network
address families") mapped datagram ICMP sockets to the new icmp_socket
security class, but left ICMPv6 sockets unchanged. This change fixes
that oversight to handle both kinds of sockets consistently.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

ef37979a

selinux: add security in-core xattr support for tracefs · a2c7c6fb

由 Yongqin Liu 提交于 1月 09, 2017

Since kernel 4.1 ftrace is supported as a new separate filesystem. It
gets automatically mounted by the kernel under the old path
/sys/kernel/debug/tracing. Because it lives now on a separate filesystem
SELinux needs to be updated to also support setting SELinux labels
on tracefs inodes.  This is required for compatibility in Android
when moving to Linux 4.1 or newer.
Signed-off-by: NYongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: NWilliam Roberts <william.c.roberts@intel.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

a2c7c6fb

selinux: support distinctions among all network address families · da69a530

由 Stephen Smalley 提交于 1月 09, 2017

Extend SELinux to support distinctions among all network address families
implemented by the kernel by defining new socket security classes
and mapping to them. Otherwise, many sockets are mapped to the generic
socket class and are indistinguishable in policy. This has come up
previously with regard to selectively allowing access to bluetooth sockets,
and more recently with regard to selectively allowing access to AF_ALG
sockets. Guido Trentalancia submitted a patch that took a similar approach
to add only support for distinguishing AF_ALG sockets, but this generalizes
his approach to handle all address families implemented by the kernel.
Socket security classes are also added for ICMP and SCTP sockets.
Socket security classes were not defined for AF_* values that are reserved
but unimplemented in the kernel, e.g. AF_NETBEUI, AF_SECURITY, AF_ASH,
AF_ECONET, AF_SNA, AF_WANPIPE.

Backward compatibility is provided by only enabling the finer-grained
socket classes if a new policy capability is set in the policy; older
policies will behave as before. The legacy redhat1 policy capability
that was only ever used in testing within Fedora for ptrace_child
is reclaimed for this purpose; as far as I can tell, this policy
capability is not enabled in any supported distro policy.

Add a pair of conditional compilation guards to detect when new AF_* values
are added so that we can update SELinux accordingly rather than having to
belatedly update it long after new address families are introduced.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

da69a530

21 12月, 2016 1 次提交

selinux: use the kernel headers when building scripts/selinux · bfc5e3a6

由 Paul Moore 提交于 12月 21, 2016

Commit 3322d0d6 ("selinux: keep SELinux in sync with new capability
definitions") added a check on the defined capabilities without
explicitly including the capability header file which caused problems
when building genheaders for users of clang/llvm. Resolve this by
using the kernel headers when building genheaders, which is arguably
the right thing to do regardless, and explicitly including the
kernel's capability.h header file in classmap.h. We also update the
mdp build, even though it wasn't causing an error we really should
be using the headers from the kernel we are building.
Reported-by: NNicolas Iooss <nicolas.iooss@m4x.org>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

bfc5e3a6

23 11月, 2016 1 次提交

selinux: Convert isec->lock into a spinlock · 9287aed2

由 Andreas Gruenbacher 提交于 11月 15, 2016

Convert isec->lock from a mutex into a spinlock.  Instead of holding
the lock while sleeping in inode_doinit_with_dentry, set
isec->initialized to LABEL_PENDING and release the lock.  Then, when
the sid has been determined, re-acquire the lock.  If isec->initialized
is still set to LABEL_PENDING, set isec->sid; otherwise, the sid has
been set by another task (LABEL_INITIALIZED) or invalidated
(LABEL_INVALID) in the meantime.

This fixes a deadlock on gfs2 where

 * one task is in inode_doinit_with_dentry -> gfs2_getxattr, holds
   isec->lock, and tries to acquire the inode's glock, and

 * another task is in do_xmote -> inode_go_inval ->
   selinux_inode_invalidate_secctx, holds the inode's glock, and
   tries to acquire isec->lock.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
[PM: minor tweaks to keep checkpatch.pl happy]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

9287aed2

22 11月, 2016 1 次提交

selinux: keep SELinux in sync with new capability definitions · 3322d0d6

由 Stephen Smalley 提交于 11月 18, 2016

When a new capability is defined, SELinux needs to be updated.
Trigger a build error if a new capability is defined without
corresponding update to security/selinux/include/classmap.h's
COMMON_CAP2_PERMS.  This is similar to BUILD_BUG_ON() guards
in the SELinux nlmsgtab code to ensure that SELinux tracks
new netlink message types as needed.

Note that there is already a similar build guard in
security/selinux/hooks.c to detect when more than 64
capabilities are defined, since that will require adding
a third capability class to SELinux.

A nicer way to do this would be to extend scripts/selinux/genheaders
or a similar tool to auto-generate the necessary definitions and code
for SELinux capability checking from include/uapi/linux/capability.h.
AppArmor does something similar in its Makefile, although it only
needs to generate a single table of names.  That is left as future
work.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
[PM: reformat the description to keep checkpatch.pl happy]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3322d0d6

21 11月, 2016 1 次提交

selinux: normalize input to /sys/fs/selinux/enforce · ea49d10e

由 Stephen Smalley 提交于 11月 18, 2016

At present, one can write any signed integer value to
/sys/fs/selinux/enforce and it will be stored,
e.g. echo -1 > /sys/fs/selinux/enforce or echo 2 >
/sys/fs/selinux/enforce. This makes no real difference
to the kernel, since it only ever cares if it is zero or non-zero,
but some userspace code compares it with 1 to decide if SELinux
is enforcing, and this could confuse it. Only a process that is
already root and is allowed the setenforce permission in SELinux
policy can write to /sys/fs/selinux/enforce, so this is not considered
to be a security issue, but it should be fixed.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

ea49d10e

16 11月, 2016 1 次提交

posix-timers: Make them configurable · baa73d9e

由 Nicolas Pitre 提交于 11月 11, 2016

Some embedded systems have no use for them.  This removes about
25KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

baa73d9e

15 11月, 2016 4 次提交

selinux: Clean up initialization of isec->sclass · 13457d07

由 Andreas Gruenbacher 提交于 11月 10, 2016

Now that isec->initialized == LABEL_INITIALIZED implies that
isec->sclass is valid, skip such inodes immediately in
inode_doinit_with_dentry.

For the remaining inodes, initialize isec->sclass at the beginning of
inode_doinit_with_dentry to simplify the code.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

13457d07

proc: Pass file mode to proc_pid_make_inode · db978da8

由 Andreas Gruenbacher 提交于 11月 10, 2016

Pass the file mode of the proc inode to be created to
proc_pid_make_inode.  In proc_pid_make_inode, initialize inode->i_mode
before calling security_task_to_inode.  This allows selinux to set
isec->sclass right away without introducing "half-initialized" inode
security structs.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

db978da8

selinux: Minor cleanups · 42059112

由 Andreas Gruenbacher 提交于 11月 10, 2016

Fix the comment for function __inode_security_revalidate, which returns
an integer.

Use the LABEL_* constants consistently for isec->initialized.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

42059112

SELinux: Use GFP_KERNEL for selinux_parse_opts_str(). · 8931c3bd

由 Tetsuo Handa 提交于 11月 14, 2016

Since selinux_parse_opts_str() is calling match_strdup() which uses
GFP_KERNEL, it is safe to use GFP_KERNEL from kcalloc() which is
called by selinux_parse_opts_str().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

8931c3bd

20 10月, 2016 1 次提交

mm: Change vm_is_stack_for_task() to vm_is_stack_for_current() · d17af505

由 Andy Lutomirski 提交于 9月 30, 2016

Asking for a non-current task's stack can't be done without races
unless the task is frozen in kernel mode.  As far as I know,
vm_is_stack_for_task() never had a safe non-current use case.

The __unused annotation is because some KSTK_ESP implementations
ignore their parameter, which IMO is further justification for this
patch.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Link: http://lkml.kernel.org/r/4c3f68f426e6c061ca98b4fc7ef85ffbb0a25b0c.1475257877.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

d17af505

10 10月, 2016 1 次提交

printk: reinstate KERN_CONT for printing continuation lines · 4bcc595c

由 Linus Torvalds 提交于 10月 08, 2016

Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space. It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.

Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other. Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.

To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:

47492527 ("printk: add KERN_CONT annotation").

That continuation marker was initially an empty string, and didn't
actuall make any semantic difference. But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.

To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.

5fd29d6c ("printk: clean up handling of log-levels and newlines")

and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.

Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.

You can see the beginning of that conversion in commits

e11fea92 ("kmsg: export printk records to the /dev/kmsg interface")
7ff9554b ("printk: convert byte-buffer to variable-length record buffer")

with a number of follow-up commits to fix some painful fallout from that
conversion. Over all, it took a couple of months to sort out most of
it. But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.

And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks. In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.

However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit

61e99ab8 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

due to the incorrect belief that KERN_CONT wasn't meaningful. The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.

This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.

But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.

For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.

Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel). So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.

Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.

So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.

There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.

That said, we should strive to avoid the need for KERN_CONT. It does
result in real problems for logging, and should generally not be seen as
a good feature. If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.

But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's. Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".

(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited. Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bcc595c

08 10月, 2016 1 次提交

xattr: Add __vfs_{get,set,remove}xattr helpers · 5d6c3191

由 Andreas Gruenbacher 提交于 9月 29, 2016

Right now, various places in the kernel check for the existence of
getxattr, setxattr, and removexattr inode operations and directly call
those operations.  Switch to helper functions and test for the IOP_XATTR
flag instead.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Acked-by: NJames Morris <james.l.morris@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d6c3191

28 9月, 2016 1 次提交

fs: Replace CURRENT_TIME with current_time() for inode timestamps · 078cd827

由 Deepa Dinamani 提交于 9月 14, 2016

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NFelipe Balbi <balbi@kernel.org>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

078cd827

20 9月, 2016 1 次提交

lsm,audit,selinux: Introduce a new audit data type LSM_AUDIT_DATA_FILE · 43af5de7

由 Vivek Goyal 提交于 9月 09, 2016

Right now LSM_AUDIT_DATA_PATH type contains "struct path" in union "u"
of common_audit_data. This information is used to print path of file
at the same time it is also used to get to dentry and inode. And this
inode information is used to get to superblock and device and print
device information.

This does not work well for layered filesystems like overlay where dentry
contained in path is overlay dentry and not the real dentry of underlying
file system. That means inode retrieved from dentry is also overlay
inode and not the real inode.

SELinux helpers like file_path_has_perm() are doing checks on inode
retrieved from file_inode(). This returns the real inode and not the
overlay inode. That means we are doing check on real inode but for audit
purposes we are printing details of overlay inode and that can be
confusing while debugging.

Hence, introduce a new type LSM_AUDIT_DATA_FILE which carries file
information and inode retrieved is real inode using file_inode(). That
way right avc denied information is given to user.

For example, following is one example avc before the patch.

  type=AVC msg=audit(1473360868.399:214): avc:  denied  { read open } for
    pid=1765 comm="cat"
    path="/root/.../overlay/container1/merged/readfile"
    dev="overlay" ino=21443
    scontext=unconfined_u:unconfined_r:test_overlay_client_t:s0:c10,c20
    tcontext=unconfined_u:object_r:test_overlay_files_ro_t:s0
    tclass=file permissive=0

It looks as follows after the patch.

  type=AVC msg=audit(1473360017.388:282): avc:  denied  { read open } for
    pid=2530 comm="cat"
    path="/root/.../overlay/container1/merged/readfile"
    dev="dm-0" ino=2377915
    scontext=unconfined_u:unconfined_r:test_overlay_client_t:s0:c10,c20
    tcontext=unconfined_u:object_r:test_overlay_files_ro_t:s0
    tclass=file permissive=0

Notice that now dev information points to "dm-0" device instead of
"overlay" device. This makes it clear that check failed on underlying
inode and not on the overlay inode.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
[PM: slight tweaks to the description to make checkpatch.pl happy]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

43af5de7

14 9月, 2016 1 次提交

selinux: fix error return code in policydb_read() · 9b6a9ecc

由 Wei Yongjun 提交于 9月 10, 2016

Fix to return error code -EINVAL from the error handling case instead
of 0 (rc is overwrite to 0 when policyvers >=
POLICYDB_VERSION_ROLETRANS), as done elsewhere in this function.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
[PM: normalize "selinux" in patch subject, description line wrap]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

9b6a9ecc

31 8月, 2016 1 次提交

selinux: fix overflow and 0 length allocations · 7c686af0

由 William Roberts 提交于 8月 30, 2016

Throughout the SELinux LSM, values taken from sepolicy are
used in places where length == 0 or length == <saturated>
matter, find and fix these.
Signed-off-by: NWilliam Roberts <william.c.roberts@intel.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

7c686af0

30 8月, 2016 2 次提交

selinux: initialize structures · 3bc7bcf6

由 William Roberts 提交于 8月 23, 2016

libsepol pointed out an issue where its possible to have
an unitialized jmp and invalid dereference, fix this.
While we're here, zero allocate all the *_val_to_struct
structures.
Signed-off-by: NWilliam Roberts <william.c.roberts@intel.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3bc7bcf6

selinux: detect invalid ebitmap · 74d977b6

由 William Roberts 提交于 8月 23, 2016

When count is 0 and the highbit is not zero, the ebitmap is not
valid and the internal node is not allocated. This causes issues
when routines, like mls_context_isvalid() attempt to use the
ebitmap_for_each_bit() and ebitmap_node_get_bit() as they assume
a highbit > 0 will have a node allocated.
Signed-off-by: NWilliam Roberts <william.c.roberts@intel.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

74d977b6