Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security subsystem updates from James Morris: "New notable features: - The seccomp work from Will Drewry - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski - Longer security labels for Smack from Casey Schaufler - Additional ptrace restriction modes for Yama by Kees Cook" Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) apparmor: fix long path failure due to disconnected path apparmor: fix profile lookup for unconfined ima: fix filename hint to reflect script interpreter name KEYS: Don't check for NULL key pointer in key_validate() Smack: allow for significantly longer Smack labels v4 gfp flags for security_inode_alloc()? Smack: recursive tramsmute Yama: replace capable() with ns_capable() TOMOYO: Accept manager programs which do not start with / . KEYS: Add invalidation support KEYS: Do LRU discard in full keyrings KEYS: Permit in-place link replacement in keyring list KEYS: Perform RCU synchronisation on keys prior to key destruction KEYS: Announce key type (un)registration KEYS: Reorganise keys Makefile KEYS: Move the key config into security/keys/Kconfig KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat Yama: remove an unused variable samples/seccomp: fix dependencies on arch macros Yama: add additional ptrace scopes ...

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris: "New notable features: - The seccomp work from Will Drewry - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski - Longer security labels for Smack from Casey Schaufler - Additional ptrace restriction modes for Yama by Kees Cook" Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) apparmor: fix long path failure due to disconnected path apparmor: fix profile lookup for unconfined ima: fix filename hint to reflect script interpreter name KEYS: Don't check for NULL key pointer in key_validate() Smack: allow for significantly longer Smack labels v4 gfp flags for security_inode_alloc()? Smack: recursive tramsmute Yama: replace capable() with ns_capable() TOMOYO: Accept manager programs which do not start with / . KEYS: Add invalidation support KEYS: Do LRU discard in full keyrings KEYS: Permit in-place link replacement in keyring list KEYS: Perform RCU synchronisation on keys prior to key destruction KEYS: Announce key type (un)registration KEYS: Reorganise keys Makefile KEYS: Move the key config into security/keys/Kconfig KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat Yama: remove an unused variable samples/seccomp: fix dependencies on arch macros Yama: add additional ptrace scopes ...
cb60e3e6 · Linus Torvalds · 99262a3d · ff2bb047 · cb60e3e6 · cb60e3e6
102 changed file
--- a/Documentation/prctl/seccomp_filter.txt
+++ b/Documentation/prctl/seccomp_filter.txt
+		SECure COMPuting with filters
+		=============================
+Introduction
+------------
+A large number of system calls are exposed to every userland process
+with many of them going unused for the entire lifetime of the process.
+As system calls change and mature, bugs are found and eradicated.  A
+certain subset of userland applications benefit by having a reduced set
+of available system calls.  The resulting set reduces the total kernel
+surface exposed to the application.  System call filtering is meant for
+use with those applications.
+Seccomp filtering provides a means for a process to specify a filter for
+incoming system calls.  The filter is expressed as a Berkeley Packet
+Filter (BPF) program, as with socket filters, except that the data
+operated on is related to the system call being made: system call
+number and the system call arguments.  This allows for expressive
+filtering of system calls using a filter program language with a long
+history of being exposed to userland and a straightforward data set.
+Additionally, BPF makes it impossible for users of seccomp to fall prey
+to time-of-check-time-of-use (TOCTOU) attacks that are common in system
+call interposition frameworks.  BPF programs may not dereference
+pointers which constrains all filters to solely evaluating the system
+call arguments directly.
+What it isn't
+-------------
+System call filtering isn't a sandbox.  It provides a clearly defined
+mechanism for minimizing the exposed kernel surface.  It is meant to be
+a tool for sandbox developers to use.  Beyond that, policy for logical
+behavior and information flow should be managed with a combination of
+other system hardening techniques and, potentially, an LSM of your
+choosing.  Expressive, dynamic filters provide further options down this
+path (avoiding pathological sizes or selecting which of the multiplexed
+system calls in socketcall() is allowed, for instance) which could be
+construed, incorrectly, as a more complete sandboxing solution.
+Usage
+-----
+An additional seccomp mode is added and is enabled using the same
+prctl(2) call as the strict seccomp.  If the architecture has
+CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below:
+PR_SET_SECCOMP:
+	Now takes an additional argument which specifies a new filter
+	using a BPF program.
+	The BPF program will be executed over struct seccomp_data
+	reflecting the system call number, arguments, and other
+	metadata.  The BPF program must then return one of the
+	acceptable values to inform the kernel which action should be
+	taken.
+	Usage:
+		prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog);
+	The 'prog' argument is a pointer to a struct sock_fprog which
+	will contain the filter program.  If the program is invalid, the
+	call will return -1 and set errno to EINVAL.
+	If fork/clone and execve are allowed by @prog, any child
+	processes will be constrained to the same filters and system
+	call ABI as the parent.
+	Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
+	run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
+	true, -EACCES will be returned.  This requirement ensures that filter
+	programs cannot be applied to child processes with greater privileges
+	than the task that installed them.
+	Additionally, if prctl(2) is allowed by the attached filter,
+	additional filters may be layered on which will increase evaluation
+	time, but allow for further decreasing the attack surface during
+	execution of a process.
+The above call returns 0 on success and non-zero on error.
+Return values
+-------------
+A seccomp filter may return any of the following values. If multiple
+filters exist, the return value for the evaluation of a given system
+call will always use the highest precedent value. (For example,
+SECCOMP_RET_KILL will always take precedence.)
+In precedence order, they are:
+SECCOMP_RET_KILL:
+	Results in the task exiting immediately without executing the
+	system call.  The exit status of the task (status & 0x7f) will
+	be SIGSYS, not SIGKILL.
+SECCOMP_RET_TRAP:
+	Results in the kernel sending a SIGSYS signal to the triggering
+	task without executing the system call.  The kernel will
+	rollback the register state to just before the system call
+	entry such that a signal handler in the task will be able to
+	inspect the ucontext_t->uc_mcontext registers and emulate
+	system call success or failure upon return from the signal
+	handler.
+	The SECCOMP_RET_DATA portion of the return value will be passed
+	as si_errno.
+	SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP.
+SECCOMP_RET_ERRNO:
+	Results in the lower 16-bits of the return value being passed
+	to userland as the errno without executing the system call.
+SECCOMP_RET_TRACE:
+	When returned, this value will cause the kernel to attempt to
+	notify a ptrace()-based tracer prior to executing the system
+	call.  If there is no tracer present, -ENOSYS is returned to
+	userland and the system call is not executed.
+	A tracer will be notified if it requests PTRACE_O_TRACESECCOMP
+	using ptrace(PTRACE_SETOPTIONS).  The tracer will be notified
+	of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of
+	the BPF program return value will be available to the tracer
+	via PTRACE_GETEVENTMSG.
+SECCOMP_RET_ALLOW:
+	Results in the system call being executed.
+If multiple filters exist, the return value for the evaluation of a
+given system call will always use the highest precedent value.
+Precedence is only determined using the SECCOMP_RET_ACTION mask.  When
+multiple filters return values of the same precedence, only the
+SECCOMP_RET_DATA from the most recently installed filter will be
+returned.
+Pitfalls
+--------
+The biggest pitfall to avoid during use is filtering on system call
+number without checking the architecture value.  Why?  On any
+architecture that supports multiple system call invocation conventions,
+the system call numbers may vary based on the specific invocation.  If
+the numbers in the different calling conventions overlap, then checks in
+the filters may be abused.  Always check the arch value!
+Example
+-------
+The samples/seccomp/ directory contains both an x86-specific example
+and a more generic example of a higher level macro interface for BPF
+program generation.
+Adding architecture support
+-----------------------
+See arch/Kconfig for the authoritative requirements.  In general, if an
+architecture supports both ptrace_event and seccomp, it will be able to
+support seccomp filter with minor fixup: SIGSYS support and seccomp return
+value checking.  Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER
+to its arch-specific Kconfig.
--- a/Documentation/security/Smack.txt
+++ b/Documentation/security/Smack.txt
@@ -15,7 +15,7 @@ at hand.
 Smack consists of three major components:
    - The kernel
-    - A start-up script and a few modified applications
+    - Basic utilities, which are helpful but not required
    - Configuration data
 The kernel component of Smack is implemented as a Linux
@@ -23,37 +23,28 @@ Security Modules (LSM) module. It requires netlabel and
 works best with file systems that support extended attributes,
 although xattr support is not strictly required.
 It is safe to run a Smack kernel under a "vanilla" distribution.
 Smack kernels use the CIPSO IP option. Some network
 configurations are intolerant of IP options and can impede
 access to systems that use them as Smack does.
-The startup script etc-init.d-smack should be installed
+The current git repositories for Smack user space are:
-in /etc/init.d/smack and should be invoked early in the
-start-up process. On Fedora rc5.d/S02smack is recommended.
-This script ensures that certain devices have the correct
-Smack attributes and loads the Smack configuration if
-any is defined. This script invokes two programs that
-ensure configuration data is properly formatted. These
-programs are /usr/sbin/smackload and /usr/sin/smackcipso.
-The system will run just fine without these programs,
-but it will be difficult to set access rules properly.
-A version of "ls" that provides a "-M" option to display
-Smack labels on long listing is available.
-A hacked version of sshd that allows network logins by users
+	git@gitorious.org:meego-platform-security/smackutil.git
-with specific Smack labels is available. This version does
+	git@gitorious.org:meego-platform-security/libsmack.git
-not work for scp. You must set the /etc/ssh/sshd_config
-line:
-   UsePrivilegeSeparation no
-The format of /etc/smack/usr is:
+These should make and install on most modern distributions.
+There are three commands included in smackutil:
-   username smack
+smackload  - properly formats data for writing to /smack/load
+smackcipso - properly formats data for writing to /smack/cipso
+chsmack    - display or set Smack extended attribute values
 In keeping with the intent of Smack, configuration data is
 minimal and not strictly required. The most important
 configuration step is mounting the smackfs pseudo filesystem.
+If smackutil is installed the startup script will take care
+of this, but it can be manually as well.
 Add this line to /etc/fstab:
@@ -61,19 +52,148 @@ Add this line to /etc/fstab:
 and create the /smack directory for mounting.
-Smack uses extended attributes (xattrs) to store file labels.
+Smack uses extended attributes (xattrs) to store labels on filesystem
-The command to set a Smack label on a file is:
+objects. The attributes are stored in the extended attribute security
+name space. A process must have CAP_MAC_ADMIN to change any of these
+attributes.
+The extended attributes that Smack uses are:
+SMACK64
+	Used to make access control decisions. In almost all cases
+	the label given to a new filesystem object will be the label
+	of the process that created it.
+SMACK64EXEC
+	The Smack label of a process that execs a program file with
+	this attribute set will run with this attribute's value.
+SMACK64MMAP
+	Don't allow the file to be mmapped by a process whose Smack
+	label does not allow all of the access permitted to a process
+	with the label contained in this attribute. This is a very
+	specific use case for shared libraries.
+SMACK64TRANSMUTE
+	Can only have the value "TRUE". If this attribute is present
+	on a directory when an object is created in the directory and
+	the Smack rule (more below) that permitted the write access
+	to the directory includes the transmute ("t") mode the object
+	gets the label of the directory instead of the label of the
+	creating process. If the object being created is a directory
+	the SMACK64TRANSMUTE attribute is set as well.
+SMACK64IPIN
+	This attribute is only available on file descriptors for sockets.
+	Use the Smack label in this attribute for access control
+	decisions on packets being delivered to this socket.
+SMACK64IPOUT
+	This attribute is only available on file descriptors for sockets.
+	Use the Smack label in this attribute for access control
+	decisions on packets coming from this socket.
+There are multiple ways to set a Smack label on a file:
    # attr -S -s SMACK64 -V "value" path
+    # chsmack -a value path
-NOTE: Smack labels are limited to 23 characters. The attr command
+A process can see the smack label it is running with by
-      does not enforce this restriction and can be used to set
+reading /proc/self/attr/current. A process with CAP_MAC_ADMIN
-      invalid Smack labels on files.
+can set the process smack by writing there.
-If you don't do anything special all users will get the floor ("_")
+Most Smack configuration is accomplished by writing to files
-label when they log in. If you do want to log in via the hacked ssh
+in the smackfs filesystem. This pseudo-filesystem is usually
-at other labels use the attr command to set the smack value on the
+mounted on /smack.
-home directory and its contents.
+access
+	This interface reports whether a subject with the specified
+	Smack label has a particular access to an object with a
+	specified Smack label. Write a fixed format access rule to
+	this file. The next read will indicate whether the access
+	would be permitted. The text will be either "1" indicating
+	access, or "0" indicating denial.
+access2
+	This interface reports whether a subject with the specified
+	Smack label has a particular access to an object with a
+	specified Smack label. Write a long format access rule to
+	this file. The next read will indicate whether the access
+	would be permitted. The text will be either "1" indicating
+	access, or "0" indicating denial.
+ambient
+	This contains the Smack label applied to unlabeled network
+	packets.
+cipso
+	This interface allows a specific CIPSO header to be assigned
+	to a Smack label. The format accepted on write is:
+		"%24s%4d%4d"["%4d"]...
+	The first string is a fixed Smack label. The first number is
+	the level to use. The second number is the number of categories.
+	The following numbers are the categories.
+	"level-3-cats-5-19          3   2   5  19"
+cipso2
+	This interface allows a specific CIPSO header to be assigned
+	to a Smack label. The format accepted on write is:
+	"%s%4d%4d"["%4d"]...
+	The first string is a long Smack label. The first number is
+	the level to use. The second number is the number of categories.
+	The following numbers are the categories.
+	"level-3-cats-5-19   3   2   5  19"
+direct
+	This contains the CIPSO level used for Smack direct label
+	representation in network packets.
+doi
+	This contains the CIPSO domain of interpretation used in
+	network packets.
+load
+	This interface allows access control rules in addition to
+	the system defined rules to be specified. The format accepted
+	on write is:
+		"%24s%24s%5s"
+	where the first string is the subject label, the second the
+	object label, and the third the requested access. The access
+	string may contain only the characters "rwxat-", and specifies
+	which sort of access is allowed. The "-" is a placeholder for
+	permissions that are not allowed. The string "r-x--" would
+	specify read and execute access. Labels are limited to 23
+	characters in length.
+load2
+	This interface allows access control rules in addition to
+	the system defined rules to be specified. The format accepted
+	on write is:
+		"%s %s %s"
+	where the first string is the subject label, the second the
+	object label, and the third the requested access. The access
+	string may contain only the characters "rwxat-", and specifies
+	which sort of access is allowed. The "-" is a placeholder for
+	permissions that are not allowed. The string "r-x--" would
+	specify read and execute access.
+load-self
+	This interface allows process specific access rules to be
+	defined. These rules are only consulted if access would
+	otherwise be permitted, and are intended to provide additional
+	restrictions on the process. The format is the same as for
+	the load interface.
+load-self2
+	This interface allows process specific access rules to be
+	defined. These rules are only consulted if access would
+	otherwise be permitted, and are intended to provide additional
+	restrictions on the process. The format is the same as for
+	the load2 interface.
+logging
+	This contains the Smack logging state.
+mapped
+	This contains the CIPSO level used for Smack mapped label
+	representation in network packets.
+netlabel
+	This interface allows specific internet addresses to be
+	treated as single label hosts. Packets are sent to single
+	label hosts without CIPSO headers, but only from processes
+	that have Smack write access to the host label. All packets
+	received from single label hosts are given the specified
+	label. The format accepted on write is:
+		"%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
+onlycap
+	This contains the label processes must have for CAP_MAC_ADMIN
+	and CAP_MAC_OVERRIDE to be effective. If this file is empty
+	these capabilities are effective at for processes with any
+	label. The value is set by writing the desired label to the
+	file or cleared by writing "-" to the file.
 You can add access rules in /etc/smack/accesses. They take the form:
@@ -83,10 +203,6 @@ access is a combination of the letters rwxa which specify the
 kind of access permitted a subject with subjectlabel on an
 object with objectlabel. If there is no rule no access is allowed.
-A process can see the smack label it is running with by
-reading /proc/self/attr/current. A privileged process can
-set the process smack by writing there.
 Look for additional programs on http://schaufler-ca.com
 From the Smack Whitepaper:
@@ -186,7 +302,7 @@ team. Smack labels are unstructured, case sensitive, and the only operation
 ever performed on them is comparison for equality. Smack labels cannot
 contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
 (quote) and '"' (double-quote) characters.
-Smack labels cannot begin with a '-', which is reserved for special options.
+Smack labels cannot begin with a '-'. This is reserved for special options.
 There are some predefined labels:
@@ -194,7 +310,7 @@ There are some predefined labels:
 	^ 	Pronounced "hat", a single circumflex character.
 	* 	Pronounced "star", a single asterisk character.
 	? 	Pronounced "huh", a single question mark character.
-	@ 	Pronounced "Internet", a single at sign character.
+	@ 	Pronounced "web", a single at sign character.
 Every task on a Smack system is assigned a label. System tasks, such as
 init(8) and systems daemons, are run with the floor ("_") label. User tasks
@@ -246,13 +362,14 @@ The format of an access rule is:
 Where subject-label is the Smack label of the task, object-label is the Smack
 label of the thing being accessed, and access is a string specifying the sort
-of access allowed. The Smack labels are limited to 23 characters. The access
+of access allowed. The access specification is searched for letters that
-specification is searched for letters that describe access modes:
+describe access modes:
 	a: indicates that append access should be granted.
 	r: indicates that read access should be granted.
 	w: indicates that write access should be granted.
 	x: indicates that execute access should be granted.
+	t: indicates that the rule requests transmutation.
 Uppercase values for the specification letters are allowed as well.
 Access mode specifications can be in any order. Examples of acceptable rules
@@ -273,7 +390,7 @@ Examples of unacceptable rules are:
 Spaces are not allowed in labels. Since a subject always has access to files
 with the same label specifying a rule for that case is pointless. Only
-valid letters (rwxaRWXA) and the dash ('-') character are allowed in
+valid letters (rwxatRWXAT) and the dash ('-') character are allowed in
 access specifications. The dash is a placeholder, so "a-r" is the same
 as "ar". A lone dash is used to specify that no access should be allowed.
@@ -297,6 +414,13 @@ but not any of its attributes by the circumstance of having read access to the
 containing directory but not to the differently labeled file. This is an
 artifact of the file name being data in the directory, not a part of the file.
+If a directory is marked as transmuting (SMACK64TRANSMUTE=TRUE) and the
+access rule that allows a process to create an object in that directory
+includes 't' access the label assigned to the new object will be that
+of the directory, not the creating process. This makes it much easier
+for two processes with different labels to share data without granting
+access to all of their files.
 IPC objects, message queues, semaphore sets, and memory segments exist in flat
 namespaces and access requests are only required to match the object in
 question.

--- a/Documentation/security/Yama.txt
+++ b/Documentation/security/Yama.txt
@@ -34,7 +34,7 @@ parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still
 work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID"
 still work as root).
-For software that has defined application-specific relationships
+In mode 1, software that has defined application-specific relationships
 between a debugging process and its inferior (crash handlers, etc),
 prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which
 other process (and its descendents) are allowed to call PTRACE_ATTACH
@@ -46,6 +46,8 @@ restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)
 so that any otherwise allowed process (even those in external pid namespaces)
 may attach.
+These restrictions do not change how ptrace via PTRACE_TRACEME operates.
 The sysctl settings are:
 0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
@@ -60,6 +62,12 @@ The sysctl settings are:
    inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
    an allowed debugger PID to call PTRACE_ATTACH on the inferior.
+2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
+    with PTRACE_ATTACH.
+3 - no attach: no processes may use ptrace with PTRACE_ATTACH. Once set,
+    this sysctl cannot be changed to a lower value.
 The original children-only logic was based on the restrictions in grsecurity.
 ==============================================================
--- a/Documentation/security/keys.txt
+++ b/Documentation/security/keys.txt
@@ -805,6 +805,23 @@ The keyctl syscall functions are:
     kernel and resumes executing userspace.
+ (*) Invalidate a key.
+	long keyctl(KEYCTL_INVALIDATE, key_serial_t key);
+     This function marks a key as being invalidated and then wakes up the
+     garbage collector.  The garbage collector immediately removes invalidated
+     keys from all keyrings and deletes the key when its reference count
+     reaches zero.
+     Keys that are marked invalidated become invisible to normal key operations
+     immediately, though they are still visible in /proc/keys until deleted
+     (they're marked with an 'i' flag).
+     A process must have search permission on the key for this function to be
+     successful.
 ===============
 KERNEL SERVICES
 ===============

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1733,6 +1733,7 @@ S:	Supported
 F:	include/linux/capability.h
 F:	security/capability.c
 F:	security/commoncap.c 
+F:	kernel/capability.c
 CELL BROADBAND ENGINE ARCHITECTURE
 M:	Arnd Bergmann <arnd@arndb.de>
@@ -5950,7 +5951,7 @@ SECURITY SUBSYSTEM
 M:	James Morris <james.l.morris@oracle.com>
 L:	linux-security-module@vger.kernel.org (suggested Cc:)
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git
-W:	http://security.wiki.kernel.org/
+W:	http://kernsec.org/
 S:	Supported
 F:	security/

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,4 +231,27 @@ config HAVE_CMPXCHG_DOUBLE
 config ARCH_WANT_OLD_COMPAT_IPC
 	bool
+config HAVE_ARCH_SECCOMP_FILTER
+	bool
+	help
+	  An arch should select this symbol if it provides all of these things:
+	  - syscall_get_arch()
+	  - syscall_get_arguments()
+	  - syscall_rollback()
+	  - syscall_set_return_value()
+	  - SIGSYS siginfo_t support
+	  - secure_computing is called from a ptrace_event()-safe context
+	  - secure_computing return value is checked and a return value of -1
+	    results in the system call being skipped immediately.
+config SECCOMP_FILTER
+	def_bool y
+	depends on HAVE_ARCH_SECCOMP_FILTER && SECCOMP && NET
+	help
+	  Enable tasks to build secure computing environments defined
+	  in terms of Berkeley Packet Filter programs which implement
+	  task-defined system call filtering polices.
+	  See Documentation/prctl/seccomp_filter.txt for details.
 source "kernel/gcov/Kconfig"
--- a/arch/microblaze/kernel/ptrace.c
+++ b/arch/microblaze/kernel/ptrace.c
@@ -136,7 +136,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
-	secure_computing(regs->r12);
+	secure_computing_strict(regs->r12);
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))

--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -535,7 +535,7 @@ static inline int audit_arch(void)
 asmlinkage void syscall_trace_enter(struct pt_regs *regs)
 {
 	/* do the secure computing check first */
-	secure_computing(regs->regs[2]);
+	secure_computing_strict(regs->regs[2]);
 	if (!(current->ptrace & PT_PTRACED))
 		goto out;

--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1710,7 +1710,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
-	secure_computing(regs->gpr[0]);
+	secure_computing_strict(regs->gpr[0]);
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))

--- a/arch/s390/kernel/ptrace.c
+++ b/arch/s390/kernel/ptrace.c
@@ -719,7 +719,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 	long ret = 0;
 	/* Do the secure computing check first. */
-	secure_computing(regs->gprs[2]);
+	secure_computing_strict(regs->gprs[2]);
 	/*
 	 * The sysc_tracesys code in entry.S stored the system

--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -503,7 +503,7 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
-	secure_computing(regs->regs[0]);
+	secure_computing_strict(regs->regs[0]);
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))

--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -522,7 +522,7 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
-	secure_computing(regs->regs[9]);
+	secure_computing_strict(regs->regs[9]);
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))

--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -583,6 +583,9 @@ config SYSVIPC_COMPAT
 	depends on COMPAT && SYSVIPC
 	default y
+config KEYS_COMPAT
+	def_bool y if COMPAT && KEYS
 endmenu
 source "net/Kconfig"

--- a/arch/sparc/kernel/ptrace_64.c
+++ b/arch/sparc/kernel/ptrace_64.c
@@ -1062,7 +1062,7 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 	int ret = 0;
 	/* do the secure computing check first */
-	secure_computing(regs->u_regs[UREG_G1]);
+	secure_computing_strict(regs->u_regs[UREG_G1]);
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		ret = tracehook_report_syscall_entry(regs);

--- a/arch/sparc/kernel/systbls_64.S
+++ b/arch/sparc/kernel/systbls_64.S
@@ -74,7 +74,7 @@ sys_call_table32:
 	.word sys_timer_delete, compat_sys_timer_create, sys_ni_syscall, compat_sys_io_setup, sys_io_destroy
 /*270*/	.word sys32_io_submit, sys_io_cancel, compat_sys_io_getevents, sys32_mq_open, sys_mq_unlink
 	.word compat_sys_mq_timedsend, compat_sys_mq_timedreceive, compat_sys_mq_notify, compat_sys_mq_getsetattr, compat_sys_waitid
-/*280*/	.word sys32_tee, sys_add_key, sys_request_key, sys_keyctl, compat_sys_openat
+/*280*/	.word sys32_tee, sys_add_key, sys_request_key, compat_sys_keyctl, compat_sys_openat
 	.word sys_mkdirat, sys_mknodat, sys_fchownat, compat_sys_futimesat, compat_sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 	.word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,6 +83,7 @@ config X86
 	select GENERIC_IOMAP
 	select DCACHE_WORD_ACCESS
 	select GENERIC_SMP_IDLE_THREAD
+	select HAVE_ARCH_SECCOMP_FILTER
 config INSTRUCTION_DECODER
 	def_bool (KPROBES || PERF_EVENTS)

--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -67,6 +67,10 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			switch (from->si_code >> 16) {
 			case __SI_FAULT >> 16:
 				break;
+			case __SI_SYS >> 16:
+				put_user_ex(from->si_syscall, &to->si_syscall);
+				put_user_ex(from->si_arch, &to->si_arch);
+				break;
 			case __SI_CHLD >> 16:
 				if (ia32) {
 					put_user_ex(from->si_utime, &to->si_utime);

--- a/arch/x86/include/asm/ia32.h
+++ b/arch/x86/include/asm/ia32.h
@@ -144,6 +144,12 @@ typedef struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+		struct {
+			unsigned int _call_addr; /* calling insn */
+			int _syscall;	/* triggering system call number */
+			unsigned int _arch;	/* AUDIT_ARCH_* of syscall */
+		} _sigsys;
 	} _sifields;
 } compat_siginfo_t;

--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -13,9 +13,11 @@
 #ifndef _ASM_X86_SYSCALL_H
 #define _ASM_X86_SYSCALL_H
+#include <linux/audit.h>
 #include <linux/sched.h>
 #include <linux/err.h>
 #include <asm/asm-offsets.h>	/* For NR_syscalls */
+#include <asm/thread_info.h>	/* for TS_COMPAT */
 #include <asm/unistd.h>
 extern const unsigned long sys_call_table[];
@@ -88,6 +90,12 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs->bx + i, args, n * sizeof(args[0]));
 }
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	return AUDIT_ARCH_I386;
+}
 #else	 /* CONFIG_X86_64 */
 static inline void syscall_get_arguments(struct task_struct *task,
@@ -212,6 +220,25 @@ static inline void syscall_set_arguments(struct task_struct *task,
 		}
 }
+static inline int syscall_get_arch(struct task_struct *task,
+				   struct pt_regs *regs)
+{
+#ifdef CONFIG_IA32_EMULATION
+	/*
+	 * TS_COMPAT is set for 32-bit syscall entry and then
+	 * remains set until we return to user mode.
+	 *
+	 * TIF_IA32 tasks should always have TS_COMPAT set at
+	 * system call time.
+	 *
+	 * x32 tasks should be considered AUDIT_ARCH_X86_64.
+	 */
+	if (task_thread_info(task)->status & TS_COMPAT)
+		return AUDIT_ARCH_I386;
+#endif
+	/* Both x32 and x86_64 are considered "64-bit". */
+	return AUDIT_ARCH_X86_64;
+}
 #endif	/* CONFIG_X86_32 */
 #endif	/* _ASM_X86_SYSCALL_H */
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1480,7 +1480,11 @@ long syscall_trace_enter(struct pt_regs *regs)
 		regs->flags |= X86_EFLAGS_TF;
 	/* do the secure computing check first */
-	secure_computing(regs->orig_ax);
+	if (secure_computing(regs->orig_ax)) {
+		/* seccomp failures shouldn't expose any additional code. */
+		ret = -1L;
+		goto out;
+	}
 	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
 		ret = -1L;
@@ -1505,6 +1509,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 				    regs->dx, regs->r10);
 #endif
+out:
 	return ret ?: regs->orig_ax;
 }

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1245,6 +1245,13 @@ static int check_unsafe_exec(struct linux_binprm *bprm)
 			bprm->unsafe |= LSM_UNSAFE_PTRACE;
 	}
+	/*
+	 * This isn't strictly necessary, but it makes it harder for LSMs to
+	 * mess up.
+	 */
+	if (current->no_new_privs)
+		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
 	n_fs = 1;
 	spin_lock(&p->fs->lock);
 	rcu_read_lock();
@@ -1288,7 +1295,8 @@ int prepare_binprm(struct linux_binprm *bprm)
 	bprm->cred->euid = current_euid();
 	bprm->cred->egid = current_egid();
-	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
+	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
+	    !current->no_new_privs) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
 			bprm->per_clear |= PER_CLEAR_ON_SETID;

--- a/fs/open.c
+++ b/fs/open.c
@@ -681,7 +681,7 @@ static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 	f->f_op = fops_get(inode->i_fop);
-	error = security_dentry_open(f, cred);
+	error = security_file_open(f, cred);
 	if (error)
 		goto cleanup_all;

--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -98,9 +98,18 @@ typedef struct siginfo {
 			__ARCH_SI_BAND_T _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+		/* SIGSYS */
+		struct {
+			void __user *_call_addr; /* calling user insn */
+			int _syscall;	/* triggering system call number */
+			unsigned int _arch;	/* AUDIT_ARCH_* of syscall */
+		} _sigsys;
 	} _sifields;
 } __ARCH_SI_ATTRIBUTES siginfo_t;
+/* If the arch shares siginfo, then it has SIGSYS. */
+#define __ARCH_SIGSYS
 #endif
 /*
@@ -124,6 +133,11 @@ typedef struct siginfo {
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
 #define si_band		_sifields._sigpoll._band
 #define si_fd		_sifields._sigpoll._fd
+#ifdef __ARCH_SIGSYS
+#define si_call_addr	_sifields._sigsys._call_addr
+#define si_syscall	_sifields._sigsys._syscall
+#define si_arch		_sifields._sigsys._arch
+#endif
 #ifdef __KERNEL__
 #define __SI_MASK	0xffff0000u
@@ -134,6 +148,7 @@ typedef struct siginfo {
 #define __SI_CHLD	(4 << 16)
 #define __SI_RT		(5 << 16)
 #define __SI_MESGQ	(6 << 16)
+#define __SI_SYS	(7 << 16)
 #define __SI_CODE(T,N)	((T) | ((N) & 0xffff))
 #else
 #define __SI_KILL	0
@@ -143,6 +158,7 @@ typedef struct siginfo {
 #define __SI_CHLD	0
 #define __SI_RT		0
 #define __SI_MESGQ	0
+#define __SI_SYS	0
 #define __SI_CODE(T,N)	(N)
 #endif
@@ -239,6 +255,12 @@ typedef struct siginfo {
 #define POLL_HUP	(__SI_POLL|6)	/* device disconnected */
 #define NSIGPOLL	6
+/*
+ * SIGSYS si_codes
+ */
+#define SYS_SECCOMP		(__SI_SYS|1)	/* seccomp triggered */
+#define NSIGSYS	1
 /*
 * sigevent definitions
 * 

--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -142,4 +142,18 @@ void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
 			   unsigned int i, unsigned int n,
 			   const unsigned long *args);
+/**
+ * syscall_get_arch - return the AUDIT_ARCH for the current system call
+ * @task:	task of interest, must be in system call entry tracing
+ * @regs:	task_pt_regs() of @task
+ *
+ * Returns the AUDIT_ARCH_* based on the system call convention in use.
+ *
+ * It's only valid to call this when @task is stopped on entry to a system
+ * call, due to %TIF_SYSCALL_TRACE, %TIF_SYSCALL_AUDIT, or %TIF_SECCOMP.
+ *
+ * Architectures which permit CONFIG_HAVE_ARCH_SECCOMP_FILTER must
+ * provide an implementation of this.
+ */
+int syscall_get_arch(struct task_struct *task, struct pt_regs *regs);
 #endif	/* _ASM_SYSCALL_H */
--- a/include/keys/keyring-type.h
+++ b/include/keys/keyring-type.h
@@ -24,7 +24,7 @@ struct keyring_list {
 	unsigned short	maxkeys;	/* max keys this list can hold */
 	unsigned short	nkeys;		/* number of keys currently held */
 	unsigned short	delkey;		/* key to be unlinked by RCU */
-	struct key	*keys[0];
+	struct key __rcu *keys[0];
 };

--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -330,6 +330,7 @@ header-y += scc.h
 header-y += sched.h
 header-y += screen_info.h
 header-y += sdla.h
+header-y += seccomp.h
 header-y += securebits.h
 header-y += selinux_netlink.h
 header-y += sem.h

--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -463,7 +463,7 @@ extern void audit_putname(const char *name);
 extern void __audit_inode(const char *name, const struct dentry *dentry);
 extern void __audit_inode_child(const struct dentry *dentry,
 				const struct inode *parent);
-extern void __audit_seccomp(unsigned long syscall);
+extern void __audit_seccomp(unsigned long syscall, long signr, int code);
 extern void __audit_ptrace(struct task_struct *t);
 static inline int audit_dummy_context(void)
@@ -508,10 +508,10 @@ static inline void audit_inode_child(const struct dentry *dentry,
 }
 void audit_core_dumps(long signr);
-static inline void audit_seccomp(unsigned long syscall)
+static inline void audit_seccomp(unsigned long syscall, long signr, int code)
 {
 	if (unlikely(!audit_dummy_context()))
-		__audit_seccomp(syscall);
+		__audit_seccomp(syscall, signr, code);
 }
 static inline void audit_ptrace(struct task_struct *t)
@@ -634,7 +634,7 @@ extern int audit_signals;
 #define audit_inode(n,d) do { (void)(d); } while (0)
 #define audit_inode_child(i,p) do { ; } while (0)
 #define audit_core_dumps(i) do { ; } while (0)
-#define audit_seccomp(i) do { ; } while (0)
+#define audit_seccomp(i,s,c) do { ; } while (0)
 #define auditsc_get_stamp(c,t,s) (0)
 #define audit_get_loginuid(t) (-1)
 #define audit_get_sessionid(t) (-1)

--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -10,6 +10,7 @@
 #ifdef __KERNEL__
 #include <linux/atomic.h>
+#include <linux/compat.h>
 #endif
 /*
@@ -133,6 +134,16 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 #ifdef __KERNEL__
+#ifdef CONFIG_COMPAT
+/*
+ * A struct sock_filter is architecture independent.
+ */
+struct compat_sock_fprog {
+	u16		len;
+	compat_uptr_t	filter;		/* struct sock_filter * */
+};
+#endif
 struct sk_buff;
 struct sock;
@@ -233,6 +244,7 @@ enum {
 	BPF_S_ANC_RXHASH,
 	BPF_S_ANC_CPU,
 	BPF_S_ANC_ALU_XOR_X,
+	BPF_S_ANC_SECCOMP_LD_W,
 };
 #endif /* __KERNEL__ */

--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -124,7 +124,10 @@ static inline unsigned long is_key_possessed(const key_ref_t key_ref)
 struct key {
 	atomic_t		usage;		/* number of references */
 	key_serial_t		serial;		/* key serial number */
-	struct rb_node		serial_node;
+	union {
+		struct list_head graveyard_link;
+		struct rb_node	serial_node;
+	};
 	struct key_type		*type;		/* type of key */
 	struct rw_semaphore	sem;		/* change vs change sem */
 	struct key_user		*user;		/* owner of this key */
@@ -133,6 +136,7 @@ struct key {
 		time_t		expiry;		/* time at which key expires (or 0) */
 		time_t		revoked_at;	/* time at which key was revoked */
 	};
+	time_t			last_used_at;	/* last time used for LRU keyring discard */
 	uid_t			uid;
 	gid_t			gid;
 	key_perm_t		perm;		/* access permissions */
@@ -156,6 +160,7 @@ struct key {
 #define KEY_FLAG_USER_CONSTRUCT	4	/* set if key is being constructed in userspace */
 #define KEY_FLAG_NEGATIVE	5	/* set if key is negative */
 #define KEY_FLAG_ROOT_CAN_CLEAR	6	/* set if key can be cleared by root without permission */
+#define KEY_FLAG_INVALIDATED	7	/* set if key has been invalidated */
 	/* the description string
 	 * - this is used to match a key against search criteria
@@ -199,6 +204,7 @@ extern struct key *key_alloc(struct key_type *type,
 #define KEY_ALLOC_NOT_IN_QUOTA	0x0002	/* not in quota */
 extern void key_revoke(struct key *key);
+extern void key_invalidate(struct key *key);
 extern void key_put(struct key *key);
 static inline struct key *key_get(struct key *key)
@@ -236,7 +242,7 @@ extern struct key *request_key_async_with_auxdata(struct key_type *type,
 extern int wait_for_key_construction(struct key *key, bool intr);
-extern int key_validate(struct key *key);
+extern int key_validate(const struct key *key);
 extern key_ref_t key_create_or_update(key_ref_t keyring,
 				      const char *type,
@@ -319,6 +325,7 @@ extern void key_init(void);
 #define key_serial(k)			0
 #define key_get(k) 			({ NULL; })
 #define key_revoke(k)			do { } while(0)
+#define key_invalidate(k)		do { } while(0)
 #define key_put(k)			do { } while(0)
 #define key_ref_put(k)			do { } while(0)
 #define make_key_ref(k, p)		NULL

--- a/include/linux/keyctl.h
+++ b/include/linux/keyctl.h
@@ -55,5 +55,6 @@
 #define KEYCTL_SESSION_TO_PARENT	18	/* apply session keyring to parent process */
 #define KEYCTL_REJECT			19	/* reject a partially constructed key */
 #define KEYCTL_INSTANTIATE_IOV		20	/* instantiate a partially constructed key */
+#define KEYCTL_INVALIDATE		21	/* invalidate a key */
 #endif /*  _LINUX_KEYCTL_H */
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -53,7 +53,6 @@ struct common_audit_data {
 #define LSM_AUDIT_DATA_KMOD	8
 #define LSM_AUDIT_DATA_INODE	9
 #define LSM_AUDIT_DATA_DENTRY	10
-	struct task_struct *tsk;
 	union 	{
 		struct path path;
 		struct dentry *dentry;
@@ -93,11 +92,6 @@ int ipv4_skb_to_auditdata(struct sk_buff *skb,
 int ipv6_skb_to_auditdata(struct sk_buff *skb,
 		struct common_audit_data *ad, u8 *proto);
-/* Initialize an LSM audit data structure. */
-#define COMMON_AUDIT_DATA_INIT(_d, _t) \
-	{ memset((_d), 0, sizeof(struct common_audit_data)); \
-	 (_d)->type = LSM_AUDIT_DATA_##_t; }
 void common_lsm_audit(struct common_audit_data *a,
 	void (*pre_audit)(struct audit_buffer *, void *),
 	void (*post_audit)(struct audit_buffer *, void *));

--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -124,4 +124,19 @@
 #define PR_SET_CHILD_SUBREAPER 36
 #define PR_GET_CHILD_SUBREAPER 37
+/*
+ * If no_new_privs is set, then operations that grant new privileges (i.e.
+ * execve) will either fail or not grant them.  This affects suid/sgid,
+ * file capabilities, and LSMs.
+ *
+ * Operations that merely manipulate or drop existing privileges (setresuid,
+ * capset, etc.) will still work.  Drop those privileges if you want them gone.
+ *
+ * Changing LSM security domain is considered a new privilege.  So, for example,
+ * asking selinux for a specific new context (e.g. with runcon) will result
+ * in execve returning -EPERM.
+ */
+#define PR_SET_NO_NEW_PRIVS 38
+#define PR_GET_NO_NEW_PRIVS 39
 #endif /* _LINUX_PRCTL_H */
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -58,6 +58,7 @@
 #define PTRACE_EVENT_EXEC	4
 #define PTRACE_EVENT_VFORK_DONE	5
 #define PTRACE_EVENT_EXIT	6
+#define PTRACE_EVENT_SECCOMP	7
 /* Extended result codes which enabled by means other than options.  */
 #define PTRACE_EVENT_STOP	128
@@ -69,8 +70,9 @@
 #define PTRACE_O_TRACEEXEC	(1 << PTRACE_EVENT_EXEC)
 #define PTRACE_O_TRACEVFORKDONE	(1 << PTRACE_EVENT_VFORK_DONE)
 #define PTRACE_O_TRACEEXIT	(1 << PTRACE_EVENT_EXIT)
+#define PTRACE_O_TRACESECCOMP	(1 << PTRACE_EVENT_SECCOMP)
-#define PTRACE_O_MASK		0x0000007f
+#define PTRACE_O_MASK		0x000000ff
 #include <asm/ptrace.h>
@@ -98,6 +100,7 @@
 #define PT_TRACE_EXEC		PT_EVENT_FLAG(PTRACE_EVENT_EXEC)
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
+#define PT_TRACE_SECCOMP	PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP)
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1341,6 +1341,8 @@ struct task_struct {
 				 * execve */
 	unsigned in_iowait:1;
+	/* task may not gain privileges */
+	unsigned no_new_privs:1;
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
@@ -1450,7 +1452,7 @@ struct task_struct {
 	uid_t loginuid;
 	unsigned int sessionid;
 #endif
-	seccomp_t seccomp;
+	struct seccomp seccomp;
 /* Thread group tracking */
   	u32 parent_exec_id;

--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
 #ifndef _LINUX_SECCOMP_H
 #define _LINUX_SECCOMP_H
+#include <linux/compiler.h>
+#include <linux/types.h>
+/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */
+#define SECCOMP_MODE_DISABLED	0 /* seccomp is not in use. */
+#define SECCOMP_MODE_STRICT	1 /* uses hard-coded filter. */
+#define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */
+/*
+ * All BPF programs must return a 32-bit value.
+ * The bottom 16-bits are for optional return data.
+ * The upper 16-bits are ordered from least permissive values to most.
+ *
+ * The ordering ensures that a min_t() over composed return values always
+ * selects the least permissive choice.
+ */
+#define SECCOMP_RET_KILL	0x00000000U /* kill the task immediately */
+#define SECCOMP_RET_TRAP	0x00030000U /* disallow and force a SIGSYS */
+#define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
+#define SECCOMP_RET_TRACE	0x7ff00000U /* pass to a tracer or disallow */
+#define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
+/* Masks for the return value sections. */
+#define SECCOMP_RET_ACTION	0x7fff0000U
+#define SECCOMP_RET_DATA	0x0000ffffU
+/**
+ * struct seccomp_data - the format the BPF program executes over.
+ * @nr: the system call number
+ * @arch: indicates system call convention as an AUDIT_ARCH_* value
+ *        as defined in <linux/audit.h>.
+ * @instruction_pointer: at the time of the system call.
+ * @args: up to 6 system call arguments always stored as 64-bit values
+ *        regardless of the architecture.
+ */
+struct seccomp_data {
+	int nr;
+	__u32 arch;
+	__u64 instruction_pointer;
+	__u64 args[6];
+};
+#ifdef __KERNEL__
 #ifdef CONFIG_SECCOMP
 #include <linux/thread_info.h>
 #include <asm/seccomp.h>
-typedef struct { int mode; } seccomp_t;
+struct seccomp_filter;
+/**
-extern void __secure_computing(int);
+ * struct seccomp - the state of a seccomp'ed process
-static inline void secure_computing(int this_syscall)
+ *
+ * @mode:  indicates one of the valid values above for controlled
+ *         system calls available to a process.
+ * @filter: The metadata and ruleset for determining what system calls
+ *          are allowed for a task.
+ *
+ *          @filter must only be accessed from the context of current as there
+ *          is no locking.
+ */
+struct seccomp {
+	int mode;
+	struct seccomp_filter *filter;
+};
+extern int __secure_computing(int);
+static inline int secure_computing(int this_syscall)
 {
 	if (unlikely(test_thread_flag(TIF_SECCOMP)))
-		__secure_computing(this_syscall);
+		return  __secure_computing(this_syscall);
+	return 0;
+}
+/* A wrapper for architectures supporting only SECCOMP_MODE_STRICT. */
+static inline void secure_computing_strict(int this_syscall)
+{
+	BUG_ON(secure_computing(this_syscall) != 0);
 }
 extern long prctl_get_seccomp(void);
-extern long prctl_set_seccomp(unsigned long);
+extern long prctl_set_seccomp(unsigned long, char __user *);
-static inline int seccomp_mode(seccomp_t *s)
+static inline int seccomp_mode(struct seccomp *s)
 {
 	return s->mode;
 }
@@ -28,25 +93,41 @@ static inline int seccomp_mode(seccomp_t *s)
 #include <linux/errno.h>
-typedef struct { } seccomp_t;
+struct seccomp { };
+struct seccomp_filter { };
-#define secure_computing(x) do { } while (0)
+static inline int secure_computing(int this_syscall) { return 0; }
+static inline void secure_computing_strict(int this_syscall) { return; }
 static inline long prctl_get_seccomp(void)
 {
 	return -EINVAL;
 }
-static inline long prctl_set_seccomp(unsigned long arg2)
+static inline long prctl_set_seccomp(unsigned long arg2, char __user *arg3)
 {
 	return -EINVAL;
 }
-static inline int seccomp_mode(seccomp_t *s)
+static inline int seccomp_mode(struct seccomp *s)
 {
 	return 0;
 }
 #endif /* CONFIG_SECCOMP */
+#ifdef CONFIG_SECCOMP_FILTER
+extern void put_seccomp_filter(struct task_struct *tsk);
+extern void get_seccomp_filter(struct task_struct *tsk);
+extern u32 seccomp_bpf_load(int off);
+#else  /* CONFIG_SECCOMP_FILTER */
+static inline void put_seccomp_filter(struct task_struct *tsk)
+{
+	return;
+}
+static inline void get_seccomp_filter(struct task_struct *tsk)
+{
+	return;
+}
+#endif /* CONFIG_SECCOMP_FILTER */
+#endif /* __KERNEL__ */
 #endif /* _LINUX_SECCOMP_H */
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -144,6 +144,7 @@ struct request_sock;
 #define LSM_UNSAFE_SHARE	1
 #define LSM_UNSAFE_PTRACE	2
 #define LSM_UNSAFE_PTRACE_CAP	4
+#define LSM_UNSAFE_NO_NEW_PRIVS	8
 #ifdef CONFIG_MMU
 extern int mmap_min_addr_handler(struct ctl_table *table, int write,
@@ -639,10 +640,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
 *	to receive an open file descriptor via socket IPC.
 *	@file contains the file structure being received.
 *	Return 0 if permission is granted.
- *
+ * @file_open
- * Security hook for dentry
- *
- * @dentry_open
 *	Save open-time permission checking state for later use upon
 *	file_permission, and recheck access if anything has changed
 *	since inode_permission.
@@ -1497,7 +1495,7 @@ struct security_operations {
 	int (*file_send_sigiotask) (struct task_struct *tsk,
 				    struct fown_struct *fown, int sig);
 	int (*file_receive) (struct file *file);
-	int (*dentry_open) (struct file *file, const struct cred *cred);
+	int (*file_open) (struct file *file, const struct cred *cred);
 	int (*task_create) (unsigned long clone_flags);
 	void (*task_free) (struct task_struct *task);
@@ -1756,7 +1754,7 @@ int security_file_set_fowner(struct file *file);
 int security_file_send_sigiotask(struct task_struct *tsk,
 				 struct fown_struct *fown, int sig);
 int security_file_receive(struct file *file);
-int security_dentry_open(struct file *file, const struct cred *cred);
+int security_file_open(struct file *file, const struct cred *cred);
 int security_task_create(unsigned long clone_flags);
 void security_task_free(struct task_struct *task);
 int security_cred_alloc_blank(struct cred *cred, gfp_t gfp);
@@ -2227,8 +2225,8 @@ static inline int security_file_receive(struct file *file)
 	return 0;
 }
-static inline int security_dentry_open(struct file *file,
+static inline int security_file_open(struct file *file,
-				       const struct cred *cred)
+				     const struct cred *cred)
 {
 	return 0;
 }

--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -67,6 +67,7 @@
 #include <linux/syscalls.h>
 #include <linux/capability.h>
 #include <linux/fs_struct.h>
+#include <linux/compat.h>
 #include "audit.h"
@@ -2710,13 +2711,16 @@ void audit_core_dumps(long signr)
 	audit_log_end(ab);
 }
-void __audit_seccomp(unsigned long syscall)
+void __audit_seccomp(unsigned long syscall, long signr, int code)
 {
 	struct audit_buffer *ab;
 	ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND);
-	audit_log_abend(ab, "seccomp", SIGKILL);
+	audit_log_abend(ab, "seccomp", signr);
 	audit_log_format(ab, " syscall=%ld", syscall);
+	audit_log_format(ab, " compat=%d", is_compat_task());
+	audit_log_format(ab, " ip=0x%lx", KSTK_EIP(current));
+	audit_log_format(ab, " code=0x%x", code);
 	audit_log_end(ab);
 }

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -34,6 +34,7 @@
 #include <linux/cgroup.h>
 #include <linux/security.h>
 #include <linux/hugetlb.h>
+#include <linux/seccomp.h>
 #include <linux/swap.h>
 #include <linux/syscalls.h>
 #include <linux/jiffies.h>
@@ -206,6 +207,7 @@ void free_task(struct task_struct *tsk)
 	free_thread_info(tsk->stack);
 	rt_mutex_debug_task_free(tsk);
 	ftrace_graph_exit_task(tsk);
+	put_seccomp_filter(tsk);
 	free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
@@ -1192,6 +1194,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		goto fork_out;
 	ftrace_graph_init_task(p);
+	get_seccomp_filter(p);
 	rt_mutex_init_task(p);

--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -3,16 +3,357 @@
 *
 * Copyright 2004-2005  Andrea Arcangeli <andrea@cpushare.com>
 *
- * This defines a simple but solid secure-computing mode.
+ * Copyright (C) 2012 Google, Inc.
+ * Will Drewry <wad@chromium.org>
+ *
+ * This defines a simple but solid secure-computing facility.
+ *
+ * Mode 1 uses a fixed list of allowed system calls.
+ * Mode 2 allows user-defined system call filters in the form
+ *        of Berkeley Packet Filters/Linux Socket Filters.
 */
+#include <linux/atomic.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
-#include <linux/sched.h>
 #include <linux/compat.h>
+#include <linux/sched.h>
+#include <linux/seccomp.h>
 /* #define SECCOMP_DEBUG 1 */
-#define NR_SECCOMP_MODES 1
+#ifdef CONFIG_SECCOMP_FILTER
+#include <asm/syscall.h>
+#include <linux/filter.h>
+#include <linux/ptrace.h>
+#include <linux/security.h>
+#include <linux/slab.h>
+#include <linux/tracehook.h>
+#include <linux/uaccess.h>
+/**
+ * struct seccomp_filter - container for seccomp BPF programs
+ *
+ * @usage: reference count to manage the object lifetime.
+ *         get/put helpers should be used when accessing an instance
+ *         outside of a lifetime-guarded section.  In general, this
+ *         is only needed for handling filters shared across tasks.
+ * @prev: points to a previously installed, or inherited, filter
+ * @len: the number of instructions in the program
+ * @insns: the BPF program instructions to evaluate
+ *
+ * seccomp_filter objects are organized in a tree linked via the @prev
+ * pointer.  For any task, it appears to be a singly-linked list starting
+ * with current->seccomp.filter, the most recently attached or inherited filter.
+ * However, multiple filters may share a @prev node, by way of fork(), which
+ * results in a unidirectional tree existing in memory.  This is similar to
+ * how namespaces work.
+ *
+ * seccomp_filter objects should never be modified after being attached
+ * to a task_struct (other than @usage).
+ */
+struct seccomp_filter {
+	atomic_t usage;
+	struct seccomp_filter *prev;
+	unsigned short len;  /* Instruction count */
+	struct sock_filter insns[];
+};
+/* Limit any path through the tree to 256KB worth of instructions. */
+#define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
+/**
+ * get_u32 - returns a u32 offset into data
+ * @data: a unsigned 64 bit value
+ * @index: 0 or 1 to return the first or second 32-bits
+ *
+ * This inline exists to hide the length of unsigned long.  If a 32-bit
+ * unsigned long is passed in, it will be extended and the top 32-bits will be
+ * 0. If it is a 64-bit unsigned long, then whatever data is resident will be
+ * properly returned.
+ *
+ * Endianness is explicitly ignored and left for BPF program authors to manage
+ * as per the specific architecture.
+ */
+static inline u32 get_u32(u64 data, int index)
+{
+	return ((u32 *)&data)[index];
+}
+/* Helper for bpf_load below. */
+#define BPF_DATA(_name) offsetof(struct seccomp_data, _name)
+/**
+ * bpf_load: checks and returns a pointer to the requested offset
+ * @off: offset into struct seccomp_data to load from
+ *
+ * Returns the requested 32-bits of data.
+ * seccomp_check_filter() should assure that @off is 32-bit aligned
+ * and not out of bounds.  Failure to do so is a BUG.
+ */
+u32 seccomp_bpf_load(int off)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	if (off == BPF_DATA(nr))
+		return syscall_get_nr(current, regs);
+	if (off == BPF_DATA(arch))
+		return syscall_get_arch(current, regs);
+	if (off >= BPF_DATA(args[0]) && off < BPF_DATA(args[6])) {
+		unsigned long value;
+		int arg = (off - BPF_DATA(args[0])) / sizeof(u64);
+		int index = !!(off % sizeof(u64));
+		syscall_get_arguments(current, regs, arg, 1, &value);
+		return get_u32(value, index);
+	}
+	if (off == BPF_DATA(instruction_pointer))
+		return get_u32(KSTK_EIP(current), 0);
+	if (off == BPF_DATA(instruction_pointer) + sizeof(u32))
+		return get_u32(KSTK_EIP(current), 1);
+	/* seccomp_check_filter should make this impossible. */
+	BUG();
+}
+/**
+ *	seccomp_check_filter - verify seccomp filter code
+ *	@filter: filter to verify
+ *	@flen: length of filter
+ *
+ * Takes a previously checked filter (by sk_chk_filter) and
+ * redirects all filter code that loads struct sk_buff data
+ * and related data through seccomp_bpf_load.  It also
+ * enforces length and alignment checking of those loads.
+ *
+ * Returns 0 if the rule set is legal or -EINVAL if not.
+ */
+static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
+{
+	int pc;
+	for (pc = 0; pc < flen; pc++) {
+		struct sock_filter *ftest = &filter[pc];
+		u16 code = ftest->code;
+		u32 k = ftest->k;
+		switch (code) {
+		case BPF_S_LD_W_ABS:
+			ftest->code = BPF_S_ANC_SECCOMP_LD_W;
+			/* 32-bit aligned and not out of bounds. */
+			if (k >= sizeof(struct seccomp_data) || k & 3)
+				return -EINVAL;
+			continue;
+		case BPF_S_LD_W_LEN:
+			ftest->code = BPF_S_LD_IMM;
+			ftest->k = sizeof(struct seccomp_data);
+			continue;
+		case BPF_S_LDX_W_LEN:
+			ftest->code = BPF_S_LDX_IMM;
+			ftest->k = sizeof(struct seccomp_data);
+			continue;
+		/* Explicitly include allowed calls. */
+		case BPF_S_RET_K:
+		case BPF_S_RET_A:
+		case BPF_S_ALU_ADD_K:
+		case BPF_S_ALU_ADD_X:
+		case BPF_S_ALU_SUB_K:
+		case BPF_S_ALU_SUB_X:
+		case BPF_S_ALU_MUL_K:
+		case BPF_S_ALU_MUL_X:
+		case BPF_S_ALU_DIV_X:
+		case BPF_S_ALU_AND_K:
+		case BPF_S_ALU_AND_X:
+		case BPF_S_ALU_OR_K:
+		case BPF_S_ALU_OR_X:
+		case BPF_S_ALU_LSH_K:
+		case BPF_S_ALU_LSH_X:
+		case BPF_S_ALU_RSH_K:
+		case BPF_S_ALU_RSH_X:
+		case BPF_S_ALU_NEG:
+		case BPF_S_LD_IMM:
+		case BPF_S_LDX_IMM:
+		case BPF_S_MISC_TAX:
+		case BPF_S_MISC_TXA:
+		case BPF_S_ALU_DIV_K:
+		case BPF_S_LD_MEM:
+		case BPF_S_LDX_MEM:
+		case BPF_S_ST:
+		case BPF_S_STX:
+		case BPF_S_JMP_JA:
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_K:
+		case BPF_S_JMP_JSET_X:
+			continue;
+		default:
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+/**
+ * seccomp_run_filters - evaluates all seccomp filters against @syscall
+ * @syscall: number of the current system call
+ *
+ * Returns valid seccomp BPF response codes.
+ */
+static u32 seccomp_run_filters(int syscall)
+{
+	struct seccomp_filter *f;
+	u32 ret = SECCOMP_RET_ALLOW;
+	/* Ensure unexpected behavior doesn't result in failing open. */
+	if (WARN_ON(current->seccomp.filter == NULL))
+		return SECCOMP_RET_KILL;
+	/*
+	 * All filters in the list are evaluated and the lowest BPF return
+	 * value always takes priority (ignoring the DATA).
+	 */
+	for (f = current->seccomp.filter; f; f = f->prev) {
+		u32 cur_ret = sk_run_filter(NULL, f->insns);
+		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+			ret = cur_ret;
+	}
+	return ret;
+}
+/**
+ * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * @fprog: BPF program to install
+ *
+ * Returns 0 on success or an errno on failure.
+ */
+static long seccomp_attach_filter(struct sock_fprog *fprog)
+{
+	struct seccomp_filter *filter;
+	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
+	unsigned long total_insns = fprog->len;
+	long ret;
+	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
+		return -EINVAL;
+	for (filter = current->seccomp.filter; filter; filter = filter->prev)
+		total_insns += filter->len + 4;  /* include a 4 instr penalty */
+	if (total_insns > MAX_INSNS_PER_PATH)
+		return -ENOMEM;
+	/*
+	 * Installing a seccomp filter requires that the task have
+	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+	 * This avoids scenarios where unprivileged tasks can affect the
+	 * behavior of privileged children.
+	 */
+	if (!current->no_new_privs &&
+	    security_capable_noaudit(current_cred(), current_user_ns(),
+				     CAP_SYS_ADMIN) != 0)
+		return -EACCES;
+	/* Allocate a new seccomp_filter */
+	filter = kzalloc(sizeof(struct seccomp_filter) + fp_size,
+			 GFP_KERNEL|__GFP_NOWARN);
+	if (!filter)
+		return -ENOMEM;
+	atomic_set(&filter->usage, 1);
+	filter->len = fprog->len;
+	/* Copy the instructions from fprog. */
+	ret = -EFAULT;
+	if (copy_from_user(filter->insns, fprog->filter, fp_size))
+		goto fail;
+	/* Check and rewrite the fprog via the skb checker */
+	ret = sk_chk_filter(filter->insns, filter->len);
+	if (ret)
+		goto fail;
+	/* Check and rewrite the fprog for seccomp use */
+	ret = seccomp_check_filter(filter->insns, filter->len);
+	if (ret)
+		goto fail;
+	/*
+	 * If there is an existing filter, make it the prev and don't drop its
+	 * task reference.
+	 */
+	filter->prev = current->seccomp.filter;
+	current->seccomp.filter = filter;
+	return 0;
+fail:
+	kfree(filter);
+	return ret;
+}
+/**
+ * seccomp_attach_user_filter - attaches a user-supplied sock_fprog
+ * @user_filter: pointer to the user data containing a sock_fprog.
+ *
+ * Returns 0 on success and non-zero otherwise.
+ */
+long seccomp_attach_user_filter(char __user *user_filter)
+{
+	struct sock_fprog fprog;
+	long ret = -EFAULT;
+#ifdef CONFIG_COMPAT
+	if (is_compat_task()) {
+		struct compat_sock_fprog fprog32;
+		if (copy_from_user(&fprog32, user_filter, sizeof(fprog32)))
+			goto out;
+		fprog.len = fprog32.len;
+		fprog.filter = compat_ptr(fprog32.filter);
+	} else /* falls through to the if below. */
+#endif
+	if (copy_from_user(&fprog, user_filter, sizeof(fprog)))
+		goto out;
+	ret = seccomp_attach_filter(&fprog);
+out:
+	return ret;
+}
+/* get_seccomp_filter - increments the reference count of the filter on @tsk */
+void get_seccomp_filter(struct task_struct *tsk)
+{
+	struct seccomp_filter *orig = tsk->seccomp.filter;
+	if (!orig)
+		return;
+	/* Reference count is bounded by the number of total processes. */
+	atomic_inc(&orig->usage);
+}
+/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
+void put_seccomp_filter(struct task_struct *tsk)
+{
+	struct seccomp_filter *orig = tsk->seccomp.filter;
+	/* Clean up single-reference branches iteratively. */
+	while (orig && atomic_dec_and_test(&orig->usage)) {
+		struct seccomp_filter *freeme = orig;
+		orig = orig->prev;
+		kfree(freeme);
+	}
+}
+/**
+ * seccomp_send_sigsys - signals the task to allow in-process syscall emulation
+ * @syscall: syscall number to send to userland
+ * @reason: filter-supplied reason code to send to userland (via si_errno)
+ *
+ * Forces a SIGSYS with a code of SYS_SECCOMP and related sigsys info.
+ */
+static void seccomp_send_sigsys(int syscall, int reason)
+{
+	struct siginfo info;
+	memset(&info, 0, sizeof(info));
+	info.si_signo = SIGSYS;
+	info.si_code = SYS_SECCOMP;
+	info.si_call_addr = (void __user *)KSTK_EIP(current);
+	info.si_errno = reason;
+	info.si_arch = syscall_get_arch(current, task_pt_regs(current));
+	info.si_syscall = syscall;
+	force_sig_info(SIGSYS, &info, current);
+}
+#endif	/* CONFIG_SECCOMP_FILTER */
 /*
 * Secure computing mode 1 allows only read/write/exit/sigreturn.
@@ -31,13 +372,15 @@ static int mode1_syscalls_32[] = {
 };
 #endif
-void __secure_computing(int this_syscall)
+int __secure_computing(int this_syscall)
 {
 	int mode = current->seccomp.mode;
-	int * syscall;
+	int exit_sig = 0;
+	int *syscall;
+	u32 ret;
 	switch (mode) {
-	case 1:
+	case SECCOMP_MODE_STRICT:
 		syscall = mode1_syscalls;
 #ifdef CONFIG_COMPAT
 		if (is_compat_task())
@@ -45,9 +388,54 @@ void __secure_computing(int this_syscall)
 #endif
 		do {
 			if (*syscall == this_syscall)
-				return;
+				return 0;
 		} while (*++syscall);
+		exit_sig = SIGKILL;
+		ret = SECCOMP_RET_KILL;
+		break;
+#ifdef CONFIG_SECCOMP_FILTER
+	case SECCOMP_MODE_FILTER: {
+		int data;
+		ret = seccomp_run_filters(this_syscall);
+		data = ret & SECCOMP_RET_DATA;
+		ret &= SECCOMP_RET_ACTION;
+		switch (ret) {
+		case SECCOMP_RET_ERRNO:
+			/* Set the low-order 16-bits as a errno. */
+			syscall_set_return_value(current, task_pt_regs(current),
+						 -data, 0);
+			goto skip;
+		case SECCOMP_RET_TRAP:
+			/* Show the handler the original registers. */
+			syscall_rollback(current, task_pt_regs(current));
+			/* Let the filter pass back 16 bits of data. */
+			seccomp_send_sigsys(this_syscall, data);
+			goto skip;
+		case SECCOMP_RET_TRACE:
+			/* Skip these calls if there is no tracer. */
+			if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP))
+				goto skip;
+			/* Allow the BPF to provide the event message */
+			ptrace_event(PTRACE_EVENT_SECCOMP, data);
+			/*
+			 * The delivery of a fatal signal during event
+			 * notification may silently skip tracer notification.
+			 * Terminating the task now avoids executing a system
+			 * call that may not be intended.
+			 */
+			if (fatal_signal_pending(current))
+				break;
+			return 0;
+		case SECCOMP_RET_ALLOW:
+			return 0;
+		case SECCOMP_RET_KILL:
+		default:
+			break;
+		}
+		exit_sig = SIGSYS;
 		break;
+	}
+#endif
 	default:
 		BUG();
 	}
@@ -55,8 +443,13 @@ void __secure_computing(int this_syscall)
 #ifdef SECCOMP_DEBUG
 	dump_stack();
 #endif
-	audit_seccomp(this_syscall);
+	audit_seccomp(this_syscall, exit_sig, ret);
-	do_exit(SIGKILL);
+	do_exit(exit_sig);
+#ifdef CONFIG_SECCOMP_FILTER
+skip:
+	audit_seccomp(this_syscall, exit_sig, ret);
+#endif
+	return -1;
 }
 long prctl_get_seccomp(void)
@@ -64,25 +457,48 @@ long prctl_get_seccomp(void)
 	return current->seccomp.mode;
 }
-long prctl_set_seccomp(unsigned long seccomp_mode)
+/**
+ * prctl_set_seccomp: configures current->seccomp.mode
+ * @seccomp_mode: requested mode to use
+ * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ *
+ * This function may be called repeatedly with a @seccomp_mode of
+ * SECCOMP_MODE_FILTER to install additional filters.  Every filter
+ * successfully installed will be evaluated (in reverse order) for each system
+ * call the task makes.
+ *
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
-	long ret;
+	long ret = -EINVAL;
-	/* can set it only once to be even more secure */
+	if (current->seccomp.mode &&
-	ret = -EPERM;
+	    current->seccomp.mode != seccomp_mode)
-	if (unlikely(current->seccomp.mode))
 		goto out;
-	ret = -EINVAL;
+	switch (seccomp_mode) {
-	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
+	case SECCOMP_MODE_STRICT:
-		current->seccomp.mode = seccomp_mode;
+		ret = 0;
-		set_thread_flag(TIF_SECCOMP);
 #ifdef TIF_NOTSC
 		disable_TSC();
 #endif
-		ret = 0;
+		break;
+#ifdef CONFIG_SECCOMP_FILTER
+	case SECCOMP_MODE_FILTER:
+		ret = seccomp_attach_user_filter(filter);
+		if (ret)
+			goto out;
+		break;
+#endif
+	default:
+		goto out;
 	}
- out:
+	current->seccomp.mode = seccomp_mode;
+	set_thread_flag(TIF_SECCOMP);
+out:
 	return ret;
 }
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -160,7 +160,7 @@ void recalc_sigpending(void)
 #define SYNCHRONOUS_MASK \
 	(sigmask(SIGSEGV) | sigmask(SIGBUS) | sigmask(SIGILL) | \
-	 sigmask(SIGTRAP) | sigmask(SIGFPE))
+	 sigmask(SIGTRAP) | sigmask(SIGFPE) | sigmask(SIGSYS))
 int next_signal(struct sigpending *pending, sigset_t *mask)
 {
@@ -2706,6 +2706,13 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
 		err |= __put_user(from->si_uid, &to->si_uid);
 		err |= __put_user(from->si_ptr, &to->si_ptr);
 		break;
+#ifdef __ARCH_SIGSYS
+	case __SI_SYS:
+		err |= __put_user(from->si_call_addr, &to->si_call_addr);
+		err |= __put_user(from->si_syscall, &to->si_syscall);
+		err |= __put_user(from->si_arch, &to->si_arch);
+		break;
+#endif
 	default: /* this is just in case for now ... */
 		err |= __put_user(from->si_pid, &to->si_pid);
 		err |= __put_user(from->si_uid, &to->si_uid);

--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1908,7 +1908,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = prctl_get_seccomp();
 			break;
 		case PR_SET_SECCOMP:
-			error = prctl_set_seccomp(arg2);
+			error = prctl_set_seccomp(arg2, (char __user *)arg3);
 			break;
 		case PR_GET_TSC:
 			error = GET_TSC_CTL(arg2);
@@ -1979,6 +1979,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			error = put_user(me->signal->is_child_subreaper,
 					 (int __user *) arg2);
 			break;
+		case PR_SET_NO_NEW_PRIVS:
+			if (arg2 != 1 || arg3 || arg4 || arg5)
+				return -EINVAL;
+			current->no_new_privs = 1;
+			break;
+		case PR_GET_NO_NEW_PRIVS:
+			if (arg2 || arg3 || arg4 || arg5)
+				return -EINVAL;
+			return current->no_new_privs ? 1 : 0;
 		default:
 			error = -EINVAL;
 			break;

--- a/net/compat.c
+++ b/net/compat.c
@@ -328,14 +328,6 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
 	__scm_destroy(scm);
 }
-/*
- * A struct sock_filter is architecture independent.
- */
-struct compat_sock_fprog {
-	u16		len;
-	compat_uptr_t	filter;		/* struct sock_filter * */
-};
 static int do_set_attach_filter(struct socket *sock, int level, int optname,
 				char __user *optval, unsigned int optlen)
 {

--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -38,6 +38,7 @@
 #include <linux/filter.h>
 #include <linux/reciprocal_div.h>
 #include <linux/ratelimit.h>
+#include <linux/seccomp.h>
 /* No hurry in this branch
 *
@@ -355,6 +356,11 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
 				A = 0;
 			continue;
 		}
+#ifdef CONFIG_SECCOMP_FILTER
+		case BPF_S_ANC_SECCOMP_LD_W:
+			A = seccomp_bpf_load(fentry->k);
+			continue;
+#endif
 		default:
 			WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n",
 				       fentry->code, fentry->jt,

--- a/net/dns_resolver/dns_key.c
+++ b/net/dns_resolver/dns_key.c
@@ -249,9 +249,6 @@ static int __init init_dns_resolver(void)
 	struct key *keyring;
 	int ret;
-	printk(KERN_NOTICE "Registering the %s key type\n",
-	       key_type_dns_resolver.name);
 	/* create an override credential set with a special thread keyring in
 	 * which DNS requests are cached
 	 *
@@ -301,8 +298,6 @@ static void __exit exit_dns_resolver(void)
 	key_revoke(dns_resolver_cache->thread_keyring);
 	unregister_key_type(&key_type_dns_resolver);
 	put_cred(dns_resolver_cache);
-	printk(KERN_NOTICE "Unregistered %s key type\n",
-	       key_type_dns_resolver.name);
 }
 module_init(init_dns_resolver)

--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -26,6 +26,7 @@
 #include <linux/cache.h>
 #include <linux/audit.h>
 #include <net/dst.h>
+#include <net/flow.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
 #ifdef CONFIG_XFRM_STATISTICS

--- a/samples/Makefile
+++ b/samples/Makefile
 # Makefile for Linux samples code
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ tracepoints/ trace_events/ \
-			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/
+			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
--- a/samples/seccomp/Makefile
+++ b/samples/seccomp/Makefile
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+hostprogs-$(CONFIG_SECCOMP_FILTER) := bpf-fancy dropper bpf-direct
+HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
+HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include
+bpf-fancy-objs := bpf-fancy.o bpf-helper.o
+HOSTCFLAGS_dropper.o += -I$(objtree)/usr/include
+HOSTCFLAGS_dropper.o += -idirafter $(objtree)/include
+dropper-objs := dropper.o
+HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
+HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
+bpf-direct-objs := bpf-direct.o
+# Try to match the kernel target.
+ifeq ($(CONFIG_64BIT),)
+HOSTCFLAGS_bpf-direct.o += -m32
+HOSTCFLAGS_dropper.o += -m32
+HOSTCFLAGS_bpf-helper.o += -m32
+HOSTCFLAGS_bpf-fancy.o += -m32
+HOSTLOADLIBES_bpf-direct += -m32
+HOSTLOADLIBES_bpf-fancy += -m32
+HOSTLOADLIBES_dropper += -m32
+endif
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
--- a/samples/seccomp/bpf-direct.c
+++ b/samples/seccomp/bpf-direct.c
+/*
+ * Seccomp filter example for x86 (32-bit and 64-bit) with BPF macros
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_SET_SECCOMP, 2, ...).
+ */
+#if defined(__i386__) || defined(__x86_64__)
+#define SUPPORTED_ARCH 1
+#endif
+#if defined(SUPPORTED_ARCH)
+#define __USE_GNU 1
+#define _GNU_SOURCE 1
+#include <linux/types.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
+#define syscall_nr (offsetof(struct seccomp_data, nr))
+#if defined(__i386__)
+#define REG_RESULT	REG_EAX
+#define REG_SYSCALL	REG_EAX
+#define REG_ARG0	REG_EBX
+#define REG_ARG1	REG_ECX
+#define REG_ARG2	REG_EDX
+#define REG_ARG3	REG_ESI
+#define REG_ARG4	REG_EDI
+#define REG_ARG5	REG_EBP
+#elif defined(__x86_64__)
+#define REG_RESULT	REG_RAX
+#define REG_SYSCALL	REG_RAX
+#define REG_ARG0	REG_RDI
+#define REG_ARG1	REG_RSI
+#define REG_ARG2	REG_RDX
+#define REG_ARG3	REG_R10
+#define REG_ARG4	REG_R8
+#define REG_ARG5	REG_R9
+#endif
+#ifndef PR_SET_NO_NEW_PRIVS
+#define PR_SET_NO_NEW_PRIVS 38
+#endif
+#ifndef SYS_SECCOMP
+#define SYS_SECCOMP 1
+#endif
+static void emulator(int nr, siginfo_t *info, void *void_context)
+{
+	ucontext_t *ctx = (ucontext_t *)(void_context);
+	int syscall;
+	char *buf;
+	ssize_t bytes;
+	size_t len;
+	if (info->si_code != SYS_SECCOMP)
+		return;
+	if (!ctx)
+		return;
+	syscall = ctx->uc_mcontext.gregs[REG_SYSCALL];
+	buf = (char *) ctx->uc_mcontext.gregs[REG_ARG1];
+	len = (size_t) ctx->uc_mcontext.gregs[REG_ARG2];
+	if (syscall != __NR_write)
+		return;
+	if (ctx->uc_mcontext.gregs[REG_ARG0] != STDERR_FILENO)
+		return;
+	/* Redirect stderr messages to stdout. Doesn't handle EINTR, etc */
+	ctx->uc_mcontext.gregs[REG_RESULT] = -1;
+	if (write(STDOUT_FILENO, "[ERR] ", 6) > 0) {
+		bytes = write(STDOUT_FILENO, buf, len);
+		ctx->uc_mcontext.gregs[REG_RESULT] = bytes;
+	}
+	return;
+}
+static int install_emulator(void)
+{
+	struct sigaction act;
+	sigset_t mask;
+	memset(&act, 0, sizeof(act));
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGSYS);
+	act.sa_sigaction = &emulator;
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGSYS, &act, NULL) < 0) {
+		perror("sigaction");
+		return -1;
+	}
+	if (sigprocmask(SIG_UNBLOCK, &mask, NULL)) {
+		perror("sigprocmask");
+		return -1;
+	}
+	return 0;
+}
+static int install_filter(void)
+{
+	struct sock_filter filter[] = {
+		/* Grab the system call number */
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr),
+		/* Jump table for the allowed syscalls */
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+#ifdef __NR_sigreturn
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+#endif
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2),
+		/* Check that read is only using stdin. */
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 4, 0),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
+		/* Check that write is only using stdout */
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
+		/* Trap attempts to write to stderr */
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 1, 2),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(NO_NEW_PRIVS)");
+		return 1;
+	}
+	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+		perror("prctl");
+		return 1;
+	}
+	return 0;
+}
+#define payload(_c) (_c), sizeof((_c))
+int main(int argc, char **argv)
+{
+	char buf[4096];
+	ssize_t bytes = 0;
+	if (install_emulator())
+		return 1;
+	if (install_filter())
+		return 1;
+	syscall(__NR_write, STDOUT_FILENO,
+		payload("OHAI! WHAT IS YOUR NAME? "));
+	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
+	syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
+	syscall(__NR_write, STDOUT_FILENO, buf, bytes);
+	syscall(__NR_write, STDERR_FILENO,
+		payload("Error message going to STDERR\n"));
+	return 0;
+}
+#else	/* SUPPORTED_ARCH */
+/*
+ * This sample is x86-only.  Since kernel samples are compiled with the
+ * host toolchain, a non-x86 host will result in using only the main()
+ * below.
+ */
+int main(void)
+{
+	return 1;
+}
+#endif	/* SUPPORTED_ARCH */
--- a/samples/seccomp/bpf-fancy.c
+++ b/samples/seccomp/bpf-fancy.c
+/*
+ * Seccomp BPF example using a macro-based generator.
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
+ */
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+#include "bpf-helper.h"
+#ifndef PR_SET_NO_NEW_PRIVS
+#define PR_SET_NO_NEW_PRIVS 38
+#endif
+int main(int argc, char **argv)
+{
+	struct bpf_labels l;
+	static const char msg1[] = "Please type something: ";
+	static const char msg2[] = "You typed: ";
+	char buf[256];
+	struct sock_filter filter[] = {
+		/* TODO: LOAD_SYSCALL_NR(arch) and enforce an arch */
+		LOAD_SYSCALL_NR,
+		SYSCALL(__NR_exit, ALLOW),
+		SYSCALL(__NR_exit_group, ALLOW),
+		SYSCALL(__NR_write, JUMP(&l, write_fd)),
+		SYSCALL(__NR_read, JUMP(&l, read)),
+		DENY,  /* Don't passthrough into a label */
+		LABEL(&l, read),
+		ARG(0),
+		JNE(STDIN_FILENO, DENY),
+		ARG(1),
+		JNE((unsigned long)buf, DENY),
+		ARG(2),
+		JGE(sizeof(buf), DENY),
+		ALLOW,
+		LABEL(&l, write_fd),
+		ARG(0),
+		JEQ(STDOUT_FILENO, JUMP(&l, write_buf)),
+		JEQ(STDERR_FILENO, JUMP(&l, write_buf)),
+		DENY,
+		LABEL(&l, write_buf),
+		ARG(1),
+		JEQ((unsigned long)msg1, JUMP(&l, msg1_len)),
+		JEQ((unsigned long)msg2, JUMP(&l, msg2_len)),
+		JEQ((unsigned long)buf, JUMP(&l, buf_len)),
+		DENY,
+		LABEL(&l, msg1_len),
+		ARG(2),
+		JLT(sizeof(msg1), ALLOW),
+		DENY,
+		LABEL(&l, msg2_len),
+		ARG(2),
+		JLT(sizeof(msg2), ALLOW),
+		DENY,
+		LABEL(&l, buf_len),
+		ARG(2),
+		JLT(sizeof(buf), ALLOW),
+		DENY,
+	};
+	struct sock_fprog prog = {
+		.filter = filter,
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+	};
+	ssize_t bytes;
+	bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter));
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(NO_NEW_PRIVS)");
+		return 1;
+	}
+	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+		perror("prctl(SECCOMP)");
+		return 1;
+	}
+	syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1));
+	bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1);
+	bytes = (bytes > 0 ? bytes : 0);
+	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2));
+	syscall(__NR_write, STDERR_FILENO, buf, bytes);
+	/* Now get killed */
+	syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2);
+	return 0;
+}
--- a/samples/seccomp/bpf-helper.c
+++ b/samples/seccomp/bpf-helper.c
+/*
+ * Seccomp BPF helper functions
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
+ */
+#include <stdio.h>
+#include <string.h>
+#include "bpf-helper.h"
+int bpf_resolve_jumps(struct bpf_labels *labels,
+		      struct sock_filter *filter, size_t count)
+{
+	struct sock_filter *begin = filter;
+	__u8 insn = count - 1;
+	if (count < 1)
+		return -1;
+	/*
+	* Walk it once, backwards, to build the label table and do fixups.
+	* Since backward jumps are disallowed by BPF, this is easy.
+	*/
+	filter += insn;
+	for (; filter >= begin; --insn, --filter) {
+		if (filter->code != (BPF_JMP+BPF_JA))
+			continue;
+		switch ((filter->jt<<8)|filter->jf) {
+		case (JUMP_JT<<8)|JUMP_JF:
+			if (labels->labels[filter->k].location == 0xffffffff) {
+				fprintf(stderr, "Unresolved label: '%s'\n",
+					labels->labels[filter->k].label);
+				return 1;
+			}
+			filter->k = labels->labels[filter->k].location -
+				    (insn + 1);
+			filter->jt = 0;
+			filter->jf = 0;
+			continue;
+		case (LABEL_JT<<8)|LABEL_JF:
+			if (labels->labels[filter->k].location != 0xffffffff) {
+				fprintf(stderr, "Duplicate label use: '%s'\n",
+					labels->labels[filter->k].label);
+				return 1;
+			}
+			labels->labels[filter->k].location = insn;
+			filter->k = 0; /* fall through */
+			filter->jt = 0;
+			filter->jf = 0;
+			continue;
+		}
+	}
+	return 0;
+}
+/* Simple lookup table for labels. */
+__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label)
+{
+	struct __bpf_label *begin = labels->labels, *end;
+	int id;
+	if (labels->count == 0) {
+		begin->label = label;
+		begin->location = 0xffffffff;
+		labels->count++;
+		return 0;
+	}
+	end = begin + labels->count;
+	for (id = 0; begin < end; ++begin, ++id) {
+		if (!strcmp(label, begin->label))
+			return id;
+	}
+	begin->label = label;
+	begin->location = 0xffffffff;
+	labels->count++;
+	return id;
+}
+void seccomp_bpf_print(struct sock_filter *filter, size_t count)
+{
+	struct sock_filter *end = filter + count;
+	for ( ; filter < end; ++filter)
+		printf("{ code=%u,jt=%u,jf=%u,k=%u },\n",
+			filter->code, filter->jt, filter->jf, filter->k);
+}
--- a/samples/seccomp/bpf-helper.h
+++ b/samples/seccomp/bpf-helper.h
+/*
+ * Example wrapper around BPF macros.
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_SET_SECCOMP, 2, ...).
+ *
+ * No guarantees are provided with respect to the correctness
+ * or functionality of this code.
+ */
+#ifndef __BPF_HELPER_H__
+#define __BPF_HELPER_H__
+#include <asm/bitsperlong.h>	/* for __BITS_PER_LONG */
+#include <endian.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>	/* for seccomp_data */
+#include <linux/types.h>
+#include <linux/unistd.h>
+#include <stddef.h>
+#define BPF_LABELS_MAX 256
+struct bpf_labels {
+	int count;
+	struct __bpf_label {
+		const char *label;
+		__u32 location;
+	} labels[BPF_LABELS_MAX];
+};
+int bpf_resolve_jumps(struct bpf_labels *labels,
+		      struct sock_filter *filter, size_t count);
+__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label);
+void seccomp_bpf_print(struct sock_filter *filter, size_t count);
+#define JUMP_JT 0xff
+#define JUMP_JF 0xff
+#define LABEL_JT 0xfe
+#define LABEL_JF 0xfe
+#define ALLOW \
+	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
+#define DENY \
+	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
+#define JUMP(labels, label) \
+	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
+		 JUMP_JT, JUMP_JF)
+#define LABEL(labels, label) \
+	BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
+		 LABEL_JT, LABEL_JF)
+#define SYSCALL(nr, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \
+	jt
+/* Lame, but just an example */
+#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label)
+#define EXPAND(...) __VA_ARGS__
+/* Map all width-sensitive operations */
+#if __BITS_PER_LONG == 32
+#define JEQ(x, jt) JEQ32(x, EXPAND(jt))
+#define JNE(x, jt) JNE32(x, EXPAND(jt))
+#define JGT(x, jt) JGT32(x, EXPAND(jt))
+#define JLT(x, jt) JLT32(x, EXPAND(jt))
+#define JGE(x, jt) JGE32(x, EXPAND(jt))
+#define JLE(x, jt) JLE32(x, EXPAND(jt))
+#define JA(x, jt) JA32(x, EXPAND(jt))
+#define ARG(i) ARG_32(i)
+#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
+#elif __BITS_PER_LONG == 64
+/* Ensure that we load the logically correct offset. */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define ENDIAN(_lo, _hi) _lo, _hi
+#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
+#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32)
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define ENDIAN(_lo, _hi) _hi, _lo
+#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32)
+#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
+#else
+#error "Unknown endianness"
+#endif
+union arg64 {
+	struct {
+		__u32 ENDIAN(lo32, hi32);
+	};
+	__u64 u64;
+};
+#define JEQ(x, jt) \
+	JEQ64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JGT(x, jt) \
+	JGT64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JGE(x, jt) \
+	JGE64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JNE(x, jt) \
+	JNE64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JLT(x, jt) \
+	JLT64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JLE(x, jt) \
+	JLE64(((union arg64){.u64 = (x)}).lo32, \
+	      ((union arg64){.u64 = (x)}).hi32, \
+	      EXPAND(jt))
+#define JA(x, jt) \
+	JA64(((union arg64){.u64 = (x)}).lo32, \
+	       ((union arg64){.u64 = (x)}).hi32, \
+	       EXPAND(jt))
+#define ARG(i) ARG_64(i)
+#else
+#error __BITS_PER_LONG value unusable.
+#endif
+/* Loads the arg into A */
+#define ARG_32(idx) \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx))
+/* Loads hi into A and lo in X */
+#define ARG_64(idx) \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx)), \
+	BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, HI_ARG(idx)), \
+	BPF_STMT(BPF_ST, 1) /* hi -> M[1] */
+#define JEQ32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \
+	jt
+#define JNE32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \
+	jt
+/* Checks the lo, then swaps to check the hi. A=lo,X=hi */
+#define JEQ64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define JNE64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define JA32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \
+	jt
+#define JA64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define JGE32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \
+	jt
+#define JLT32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \
+	jt
+/* Shortcut checking if hi > arg.hi. */
+#define JGE64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define JLT64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define JGT32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
+	jt
+#define JLE32(value, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 1, 0), \
+	jt
+/* Check hi > args.hi first, then do the GE checking */
+#define JGT64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define JLE64(lo, hi, jt) \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \
+	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \
+	BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
+	BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
+	BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
+	jt, \
+	BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
+#define LOAD_SYSCALL_NR \
+	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
+		 offsetof(struct seccomp_data, nr))
+#endif  /* __BPF_HELPER_H__ */
--- a/samples/seccomp/dropper.c
+++ b/samples/seccomp/dropper.c
+/*
+ * Naive system call dropper built on seccomp_filter.
+ *
+ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * Author: Will Drewry <wad@chromium.org>
+ *
+ * The code may be used by anyone for any purpose,
+ * and can serve as a starting point for developing
+ * applications using prctl(PR_SET_SECCOMP, 2, ...).
+ *
+ * When run, returns the specified errno for the specified
+ * system call number against the given architecture.
+ *
+ * Run this one as root as PR_SET_NO_NEW_PRIVS is not called.
+ */
+#include <errno.h>
+#include <linux/audit.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+#include <linux/unistd.h>
+#include <stdio.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <sys/prctl.h>
+#include <unistd.h>
+static int install_filter(int nr, int arch, int error)
+{
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
+			 (offsetof(struct seccomp_data, arch))),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, arch, 0, 3),
+		BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
+			 (offsetof(struct seccomp_data, nr))),
+		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1),
+		BPF_STMT(BPF_RET+BPF_K,
+			 SECCOMP_RET_ERRNO|(error & SECCOMP_RET_DATA)),
+		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+	if (prctl(PR_SET_SECCOMP, 2, &prog)) {
+		perror("prctl");
+		return 1;
+	}
+	return 0;
+}
+int main(int argc, char **argv)
+{
+	if (argc < 5) {
+		fprintf(stderr, "Usage:\n"
+			"dropper <syscall_nr> <arch> <errno> <prog> [<args>]\n"
+			"Hint:	AUDIT_ARCH_I386: 0x%X\n"
+			"	AUDIT_ARCH_X86_64: 0x%X\n"
+			"\n", AUDIT_ARCH_I386, AUDIT_ARCH_X86_64);
+		return 1;
+	}
+	if (install_filter(strtol(argv[1], NULL, 0), strtol(argv[2], NULL, 0),
+			   strtol(argv[3], NULL, 0)))
+		return 1;
+	execv(argv[4], &argv[4]);
+	printf("Failed to execv\n");
+	return 255;
+}
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -4,73 +4,7 @@
 menu "Security options"
-config KEYS
+source security/keys/Kconfig
-	bool "Enable access key retention support"
-	help
-	  This option provides support for retaining authentication tokens and
-	  access keys in the kernel.
-	  It also includes provision of methods by which such keys might be
-	  associated with a process so that network filesystems, encryption
-	  support and the like can find them.
-	  Furthermore, a special type of key is available that acts as keyring:
-	  a searchable sequence of keys. Each process is equipped with access
-	  to five standard keyrings: UID-specific, GID-specific, session,
-	  process and thread.
-	  If you are unsure as to whether this is required, answer N.
-config TRUSTED_KEYS
-	tristate "TRUSTED KEYS"
-	depends on KEYS && TCG_TPM
-	select CRYPTO
-	select CRYPTO_HMAC
-	select CRYPTO_SHA1
-	help
-	  This option provides support for creating, sealing, and unsealing
-	  keys in the kernel. Trusted keys are random number symmetric keys,
-	  generated and RSA-sealed by the TPM. The TPM only unseals the keys,
-	  if the boot PCRs and other criteria match.  Userspace will only ever
-	  see encrypted blobs.
-	  If you are unsure as to whether this is required, answer N.
-config ENCRYPTED_KEYS
-	tristate "ENCRYPTED KEYS"
-	depends on KEYS
-	select CRYPTO
-	select CRYPTO_HMAC
-	select CRYPTO_AES
-	select CRYPTO_CBC
-	select CRYPTO_SHA256
-	select CRYPTO_RNG
-	help
-	  This option provides support for create/encrypting/decrypting keys
-	  in the kernel.  Encrypted keys are kernel generated random numbers,
-	  which are encrypted/decrypted with a 'master' symmetric key. The
-	  'master' key can be either a trusted-key or user-key type.
-	  Userspace only ever sees/stores encrypted blobs.
-	  If you are unsure as to whether this is required, answer N.
-config KEYS_DEBUG_PROC_KEYS
-	bool "Enable the /proc/keys file by which keys may be viewed"
-	depends on KEYS
-	help
-	  This option turns on support for the /proc/keys file - through which
-	  can be listed all the keys on the system that are viewable by the
-	  reading process.
-	  The only keys included in the list are those that grant View
-	  permission to the reading process whether or not it possesses them.
-	  Note that LSM security checks are still performed, and may further
-	  filter out keys that the current process is not authorised to view.
-	  Only key attributes are listed here; key payloads are not included in
-	  the resulting table.
-	  If you are unsure as to whether this is required, answer N.
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"

--- a/security/apparmor/audit.c
+++ b/security/apparmor/audit.c
@@ -111,7 +111,7 @@ static const char *const aa_audit_type[] = {
 static void audit_pre(struct audit_buffer *ab, void *ca)
 {
 	struct common_audit_data *sa = ca;
-	struct task_struct *tsk = sa->tsk ? sa->tsk : current;
+	struct task_struct *tsk = sa->aad->tsk ? sa->aad->tsk : current;
 	if (aa_g_audit_header) {
 		audit_log_format(ab, "apparmor=");
@@ -149,6 +149,12 @@ static void audit_pre(struct audit_buffer *ab, void *ca)
 		audit_log_format(ab, " name=");
 		audit_log_untrustedstring(ab, sa->aad->name);
 	}
+	if (sa->aad->tsk) {
+		audit_log_format(ab, " pid=%d comm=", tsk->pid);
+		audit_log_untrustedstring(ab, tsk->comm);
+	}
 }
 /**
@@ -205,7 +211,8 @@ int aa_audit(int type, struct aa_profile *profile, gfp_t gfp,
 	aa_audit_msg(type, sa, cb);
 	if (sa->aad->type == AUDIT_APPARMOR_KILL)
-		(void)send_sig_info(SIGKILL, NULL, sa->tsk ? sa->tsk : current);
+		(void)send_sig_info(SIGKILL, NULL,
+				    sa->aad->tsk ?  sa->aad->tsk : current);
 	if (sa->aad->type == AUDIT_APPARMOR_ALLOWED)
 		return complain_error(sa->aad->error);

--- a/security/apparmor/capability.c
+++ b/security/apparmor/capability.c
@@ -65,10 +65,10 @@ static int audit_caps(struct aa_profile *profile, struct task_struct *task,
 	int type = AUDIT_APPARMOR_AUTO;
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, CAP);
+	sa.type = LSM_AUDIT_DATA_CAP;
 	sa.aad = &aad;
-	sa.tsk = task;
 	sa.u.cap = cap;
+	sa.aad->tsk = task;
 	sa.aad->op = OP_CAPABLE;
 	sa.aad->error = error;

--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -394,6 +394,11 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 			new_profile = find_attach(ns, &ns->base.profiles, name);
 		if (!new_profile)
 			goto cleanup;
+		/*
+		 * NOTE: Domain transitions from unconfined are allowed
+		 * even when no_new_privs is set because this aways results
+		 * in a further reduction of permissions.
+		 */
 		goto apply;
 	}
@@ -455,6 +460,16 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 		/* fail exec */
 		error = -EACCES;
+	/*
+	 * Policy has specified a domain transition, if no_new_privs then
+	 * fail the exec.
+	 */
+	if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) {
+		aa_put_profile(new_profile);
+		error = -EPERM;
+		goto cleanup;
+	}
 	if (!new_profile)
 		goto audit;
@@ -609,6 +624,14 @@ int aa_change_hat(const char *hats[], int count, u64 token, bool permtest)
 	const char *target = NULL, *info = NULL;
 	int error = 0;
+	/*
+	 * Fail explicitly requested domain transitions if no_new_privs.
+	 * There is no exception for unconfined as change_hat is not
+	 * available.
+	 */
+	if (current->no_new_privs)
+		return -EPERM;
 	/* released below */
 	cred = get_current_cred();
 	cxt = cred->security;
@@ -750,6 +773,18 @@ int aa_change_profile(const char *ns_name, const char *hname, bool onexec,
 	cxt = cred->security;
 	profile = aa_cred_profile(cred);
+	/*
+	 * Fail explicitly requested domain transitions if no_new_privs
+	 * and not unconfined.
+	 * Domain transitions from unconfined are allowed even when
+	 * no_new_privs is set because this aways results in a reduction
+	 * of permissions.
+	 */
+	if (current->no_new_privs && !unconfined(profile)) {
+		put_cred(cred);
+		return -EPERM;
+	}
 	if (ns_name) {
 		/* released below */
 		ns = aa_find_namespace(profile->ns, ns_name);

--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -108,7 +108,7 @@ int aa_audit_file(struct aa_profile *profile, struct file_perms *perms,
 	int type = AUDIT_APPARMOR_AUTO;
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = op,
 	aad.fs.request = request;

--- a/security/apparmor/include/audit.h
+++ b/security/apparmor/include/audit.h
@@ -110,6 +110,7 @@ struct apparmor_audit_data {
 	void *profile;
 	const char *name;
 	const char *info;
+	struct task_struct *tsk;
 	union {
 		void *target;
 		struct {

--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -42,7 +42,7 @@ static int aa_audit_ptrace(struct aa_profile *profile,
 {
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = OP_PTRACE;
 	aad.target = target;

--- a/security/apparmor/lib.c
+++ b/security/apparmor/lib.c
@@ -66,7 +66,7 @@ void aa_info_message(const char *str)
 	if (audit_enabled) {
 		struct common_audit_data sa;
 		struct apparmor_audit_data aad = {0,};
-		COMMON_AUDIT_DATA_INIT(&sa, NONE);
+		sa.type = LSM_AUDIT_DATA_NONE;
 		sa.aad = &aad;
 		aad.info = str;
 		aa_audit_msg(AUDIT_APPARMOR_STATUS, &sa, NULL);

--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -373,7 +373,7 @@ static int apparmor_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
 				      AA_MAY_META_READ);
 }
-static int apparmor_dentry_open(struct file *file, const struct cred *cred)
+static int apparmor_file_open(struct file *file, const struct cred *cred)
 {
 	struct aa_file_cxt *fcxt = file->f_security;
 	struct aa_profile *profile;
@@ -589,7 +589,7 @@ static int apparmor_setprocattr(struct task_struct *task, char *name,
 		} else {
 			struct common_audit_data sa;
 			struct apparmor_audit_data aad = {0,};
-			COMMON_AUDIT_DATA_INIT(&sa, NONE);
+			sa.type = LSM_AUDIT_DATA_NONE;
 			sa.aad = &aad;
 			aad.op = OP_SETPROCATTR;
 			aad.info = name;
@@ -640,9 +640,9 @@ static struct security_operations apparmor_ops = {
 	.path_chmod =			apparmor_path_chmod,
 	.path_chown =			apparmor_path_chown,
 	.path_truncate =		apparmor_path_truncate,
-	.dentry_open =			apparmor_dentry_open,
 	.inode_getattr =                apparmor_inode_getattr,
+	.file_open =			apparmor_file_open,
 	.file_permission =		apparmor_file_permission,
 	.file_alloc_security =		apparmor_file_alloc_security,
 	.file_free_security =		apparmor_file_free_security,

--- a/security/apparmor/path.c
+++ b/security/apparmor/path.c
@@ -94,6 +94,8 @@ static int d_namespace_path(struct path *path, char *buf, int buflen,
 	 * be returned.
 	 */
 	if (!res || IS_ERR(res)) {
+		if (PTR_ERR(res) == -ENAMETOOLONG)
+			return -ENAMETOOLONG;
 		connected = 0;
 		res = dentry_path_raw(path->dentry, buf, buflen);
 		if (IS_ERR(res)) {

--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -903,6 +903,10 @@ struct aa_profile *aa_lookup_profile(struct aa_namespace *ns, const char *hname)
 	profile = aa_get_profile(__lookup_profile(&ns->base, hname));
 	read_unlock(&ns->lock);
+	/* the unconfined profile is not in the regular profile list */
+	if (!profile && strcmp(hname, "unconfined") == 0)
+		profile = aa_get_profile(ns->unconfined);
 	/* refcount released by caller */
 	return profile;
 }
@@ -965,7 +969,7 @@ static int audit_policy(int op, gfp_t gfp, const char *name, const char *info,
 {
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = op;
 	aad.name = name;

--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -95,7 +95,7 @@ static int audit_iface(struct aa_profile *new, const char *name,
 	struct aa_profile *profile = __aa_current_profile();
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	if (e)
 		aad.iface.pos = e->pos - e->start;

--- a/security/apparmor/resource.c
+++ b/security/apparmor/resource.c
@@ -52,7 +52,7 @@ static int audit_resource(struct aa_profile *profile, unsigned int resource,
 	struct common_audit_data sa;
 	struct apparmor_audit_data aad = {0,};
-	COMMON_AUDIT_DATA_INIT(&sa, NONE);
+	sa.type = LSM_AUDIT_DATA_NONE;
 	sa.aad = &aad;
 	aad.op = OP_SETRLIMIT,
 	aad.rlim.rlim = resource;

--- a/security/capability.c
+++ b/security/capability.c
@@ -348,7 +348,7 @@ static int cap_file_receive(struct file *file)
 	return 0;
 }
-static int cap_dentry_open(struct file *file, const struct cred *cred)
+static int cap_file_open(struct file *file, const struct cred *cred)
 {
 	return 0;
 }
@@ -956,7 +956,7 @@ void __init security_fixup_ops(struct security_operations *ops)
 	set_to_cap_if_null(ops, file_set_fowner);
 	set_to_cap_if_null(ops, file_send_sigiotask);
 	set_to_cap_if_null(ops, file_receive);
-	set_to_cap_if_null(ops, dentry_open);
+	set_to_cap_if_null(ops, file_open);
 	set_to_cap_if_null(ops, task_create);
 	set_to_cap_if_null(ops, task_free);
 	set_to_cap_if_null(ops, cred_alloc_blank);

--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -512,14 +512,17 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
-	 * credentials unless they have the appropriate permit
+	 * credentials unless they have the appropriate permit.
+	 *
+	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
 	 */
 	if ((new->euid != old->uid ||
 	     new->egid != old->gid ||
 	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
 	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 		/* downgrade; they get no more than they had, and maybe less */
-		if (!capable(CAP_SETUID)) {
+		if (!capable(CAP_SETUID) ||
+		    (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) {
 			new->euid = new->uid;
 			new->egid = new->gid;
 		}

--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -194,7 +194,9 @@ int ima_bprm_check(struct linux_binprm *bprm)
 {
 	int rc;
-	rc = process_measurement(bprm->file, bprm->filename,
+	rc = process_measurement(bprm->file,
+				 (strcmp(bprm->filename, bprm->interp) == 0) ?
+				 bprm->filename : bprm->interp,
 				 MAY_EXEC, BPRM_CHECK);
 	return 0;
 }

--- a/security/keys/Kconfig
+++ b/security/keys/Kconfig
+#
+# Key management configuration
+#
+config KEYS
+	bool "Enable access key retention support"
+	help
+	  This option provides support for retaining authentication tokens and
+	  access keys in the kernel.
+	  It also includes provision of methods by which such keys might be
+	  associated with a process so that network filesystems, encryption
+	  support and the like can find them.
+	  Furthermore, a special type of key is available that acts as keyring:
+	  a searchable sequence of keys. Each process is equipped with access
+	  to five standard keyrings: UID-specific, GID-specific, session,
+	  process and thread.
+	  If you are unsure as to whether this is required, answer N.
+config TRUSTED_KEYS
+	tristate "TRUSTED KEYS"
+	depends on KEYS && TCG_TPM
+	select CRYPTO
+	select CRYPTO_HMAC
+	select CRYPTO_SHA1
+	help
+	  This option provides support for creating, sealing, and unsealing
+	  keys in the kernel. Trusted keys are random number symmetric keys,
+	  generated and RSA-sealed by the TPM. The TPM only unseals the keys,
+	  if the boot PCRs and other criteria match.  Userspace will only ever
+	  see encrypted blobs.
+	  If you are unsure as to whether this is required, answer N.
+config ENCRYPTED_KEYS
+	tristate "ENCRYPTED KEYS"
+	depends on KEYS
+	select CRYPTO
+	select CRYPTO_HMAC
+	select CRYPTO_AES
+	select CRYPTO_CBC
+	select CRYPTO_SHA256
+	select CRYPTO_RNG
+	help
+	  This option provides support for create/encrypting/decrypting keys
+	  in the kernel.  Encrypted keys are kernel generated random numbers,
+	  which are encrypted/decrypted with a 'master' symmetric key. The
+	  'master' key can be either a trusted-key or user-key type.
+	  Userspace only ever sees/stores encrypted blobs.
+	  If you are unsure as to whether this is required, answer N.
+config KEYS_DEBUG_PROC_KEYS
+	bool "Enable the /proc/keys file by which keys may be viewed"
+	depends on KEYS
+	help
+	  This option turns on support for the /proc/keys file - through which
+	  can be listed all the keys on the system that are viewable by the
+	  reading process.
+	  The only keys included in the list are those that grant View
+	  permission to the reading process whether or not it possesses them.
+	  Note that LSM security checks are still performed, and may further
+	  filter out keys that the current process is not authorised to view.
+	  Only key attributes are listed here; key payloads are not included in
+	  the resulting table.
+	  If you are unsure as to whether this is required, answer N.
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -2,6 +2,9 @@
 # Makefile for key management
 #
+#
+# Core
+#
 obj-y := \
 	gc.o \
 	key.o \
@@ -12,9 +15,12 @@ obj-y := \
 	request_key.o \
 	request_key_auth.o \
 	user_defined.o
-obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
-obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
 obj-$(CONFIG_KEYS_COMPAT) += compat.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSCTL) += sysctl.o
+#
+# Key types
+#
+obj-$(CONFIG_TRUSTED_KEYS) += trusted.o
+obj-$(CONFIG_ENCRYPTED_KEYS) += encrypted-keys/
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -135,6 +135,9 @@ asmlinkage long compat_sys_keyctl(u32 option,
 		return compat_keyctl_instantiate_key_iov(
 			arg2, compat_ptr(arg3), arg4, arg5);
+	case KEYCTL_INVALIDATE:
+		return keyctl_invalidate_key(arg2);
 	default:
 		return -EOPNOTSUPP;
 	}

--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -71,6 +71,15 @@ void key_schedule_gc(time_t gc_at)
 	}
 }
+/*
+ * Schedule a dead links collection run.
+ */
+void key_schedule_gc_links(void)
+{
+	set_bit(KEY_GC_KEY_EXPIRED, &key_gc_flags);
+	queue_work(system_nrt_wq, &key_gc_work);
+}
 /*
 * Some key's cleanup time was met after it expired, so we need to get the
 * reaper to go through a cycle finding expired keys.
@@ -79,8 +88,7 @@ static void key_gc_timer_func(unsigned long data)
 {
 	kenter("");
 	key_gc_next_run = LONG_MAX;
-	set_bit(KEY_GC_KEY_EXPIRED, &key_gc_flags);
+	key_schedule_gc_links();
-	queue_work(system_nrt_wq, &key_gc_work);
 }
 /*
@@ -131,12 +139,12 @@ void key_gc_keytype(struct key_type *ktype)
 static void key_gc_keyring(struct key *keyring, time_t limit)
 {
 	struct keyring_list *klist;
-	struct key *key;
 	int loop;
 	kenter("%x", key_serial(keyring));
-	if (test_bit(KEY_FLAG_REVOKED, &keyring->flags))
+	if (keyring->flags & ((1 << KEY_FLAG_INVALIDATED) |
+			      (1 << KEY_FLAG_REVOKED)))
 		goto dont_gc;
 	/* scan the keyring looking for dead keys */
@@ -148,9 +156,8 @@ static void key_gc_keyring(struct key *keyring, time_t limit)
 	loop = klist->nkeys;
 	smp_rmb();
 	for (loop--; loop >= 0; loop--) {
-		key = klist->keys[loop];
+		struct key *key = rcu_dereference(klist->keys[loop]);
-		if (test_bit(KEY_FLAG_DEAD, &key->flags) ||
+		if (key_is_dead(key, limit))
-		    (key->expiry > 0 && key->expiry <= limit))
 			goto do_gc;
 	}
@@ -168,38 +175,45 @@ static void key_gc_keyring(struct key *keyring, time_t limit)
 }
 /*
- * Garbage collect an unreferenced, detached key
+ * Garbage collect a list of unreferenced, detached keys
 */
-static noinline void key_gc_unused_key(struct key *key)
+static noinline void key_gc_unused_keys(struct list_head *keys)
 {
-	key_check(key);
+	while (!list_empty(keys)) {
+		struct key *key =
-	security_key_free(key);
+			list_entry(keys->next, struct key, graveyard_link);
+		list_del(&key->graveyard_link);
-	/* deal with the user's key tracking and quota */
-	if (test_bit(KEY_FLAG_IN_QUOTA, &key->flags)) {
+		kdebug("- %u", key->serial);
-		spin_lock(&key->user->lock);
+		key_check(key);
-		key->user->qnkeys--;
-		key->user->qnbytes -= key->quotalen;
+		security_key_free(key);
-		spin_unlock(&key->user->lock);
-	}
+		/* deal with the user's key tracking and quota */
+		if (test_bit(KEY_FLAG_IN_QUOTA, &key->flags)) {
+			spin_lock(&key->user->lock);
+			key->user->qnkeys--;
+			key->user->qnbytes -= key->quotalen;
+			spin_unlock(&key->user->lock);
+		}
-	atomic_dec(&key->user->nkeys);
+		atomic_dec(&key->user->nkeys);
-	if (test_bit(KEY_FLAG_INSTANTIATED, &key->flags))
+		if (test_bit(KEY_FLAG_INSTANTIATED, &key->flags))
-		atomic_dec(&key->user->nikeys);
+			atomic_dec(&key->user->nikeys);
-	key_user_put(key->user);
+		key_user_put(key->user);
-	/* now throw away the key memory */
+		/* now throw away the key memory */
-	if (key->type->destroy)
+		if (key->type->destroy)
-		key->type->destroy(key);
+			key->type->destroy(key);
-	kfree(key->description);
+		kfree(key->description);
 #ifdef KEY_DEBUGGING
-	key->magic = KEY_DEBUG_MAGIC_X;
+		key->magic = KEY_DEBUG_MAGIC_X;
 #endif
-	kmem_cache_free(key_jar, key);
+		kmem_cache_free(key_jar, key);
+	}
 }
 /*
@@ -211,6 +225,7 @@ static noinline void key_gc_unused_key(struct key *key)
 */
 static void key_garbage_collector(struct work_struct *work)
 {
+	static LIST_HEAD(graveyard);
 	static u8 gc_state;		/* Internal persistent state */
 #define KEY_GC_REAP_AGAIN	0x01	/* - Need another cycle */
 #define KEY_GC_REAPING_LINKS	0x02	/* - We need to reap links */
@@ -316,15 +331,22 @@ static void key_garbage_collector(struct work_struct *work)
 		key_schedule_gc(new_timer);
 	}
-	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2)) {
+	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) ||
-		/* Make sure everyone revalidates their keys if we marked a
+	    !list_empty(&graveyard)) {
-		 * bunch as being dead and make sure all keyring ex-payloads
+		/* Make sure that all pending keyring payload destructions are
-		 * are destroyed.
+		 * fulfilled and that people aren't now looking at dead or
+		 * dying keys that they don't have a reference upon or a link
+		 * to.
 		 */
-		kdebug("dead sync");
+		kdebug("gc sync");
 		synchronize_rcu();
 	}
+	if (!list_empty(&graveyard)) {
+		kdebug("gc keys");
+		key_gc_unused_keys(&graveyard);
+	}
 	if (unlikely(gc_state & (KEY_GC_REAPING_DEAD_1 |
 				 KEY_GC_REAPING_DEAD_2))) {
 		if (!(gc_state & KEY_GC_FOUND_DEAD_KEY)) {
@@ -359,7 +381,7 @@ static void key_garbage_collector(struct work_struct *work)
 	rb_erase(&key->serial_node, &key_serial_tree);
 	spin_unlock(&key_serial_lock);
-	key_gc_unused_key(key);
+	list_add_tail(&key->graveyard_link, &graveyard);
 	gc_state |= KEY_GC_REAP_AGAIN;
 	goto maybe_resched;

--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -152,7 +152,8 @@ extern long join_session_keyring(const char *name);
 extern struct work_struct key_gc_work;
 extern unsigned key_gc_delay;
 extern void keyring_gc(struct key *keyring, time_t limit);
-extern void key_schedule_gc(time_t expiry_at);
+extern void key_schedule_gc(time_t gc_at);
+extern void key_schedule_gc_links(void);
 extern void key_gc_keytype(struct key_type *ktype);
 extern int key_task_permission(const key_ref_t key_ref,
@@ -196,6 +197,17 @@ extern struct key *request_key_auth_new(struct key *target,
 extern struct key *key_get_instantiation_authkey(key_serial_t target_id);
+/*
+ * Determine whether a key is dead.
+ */
+static inline bool key_is_dead(struct key *key, time_t limit)
+{
+	return
+		key->flags & ((1 << KEY_FLAG_DEAD) |
+			      (1 << KEY_FLAG_INVALIDATED)) ||
+		(key->expiry > 0 && key->expiry <= limit);
+}
 /*
 * keyctl() functions
 */
@@ -225,6 +237,7 @@ extern long keyctl_reject_key(key_serial_t, unsigned, unsigned, key_serial_t);
 extern long keyctl_instantiate_key_iov(key_serial_t,
 				       const struct iovec __user *,
 				       unsigned, key_serial_t);
+extern long keyctl_invalidate_key(key_serial_t);
 extern long keyctl_instantiate_key_common(key_serial_t,
 					  const struct iovec __user *,

--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -954,6 +954,28 @@ void key_revoke(struct key *key)
 }
 EXPORT_SYMBOL(key_revoke);
+/**
+ * key_invalidate - Invalidate a key.
+ * @key: The key to be invalidated.
+ *
+ * Mark a key as being invalidated and have it cleaned up immediately.  The key
+ * is ignored by all searches and other operations from this point.
+ */
+void key_invalidate(struct key *key)
+{
+	kenter("%d", key_serial(key));
+	key_check(key);
+	if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) {
+		down_write_nested(&key->sem, 1);
+		if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags))
+			key_schedule_gc_links();
+		up_write(&key->sem);
+	}
+}
+EXPORT_SYMBOL(key_invalidate);
 /**
 * register_key_type - Register a type of key.
 * @ktype: The new key type.
@@ -980,6 +1002,8 @@ int register_key_type(struct key_type *ktype)
 	/* store the type */
 	list_add(&ktype->link, &key_types_list);
+	pr_notice("Key type %s registered\n", ktype->name);
 	ret = 0;
 out:
@@ -1002,6 +1026,7 @@ void unregister_key_type(struct key_type *ktype)
 	list_del_init(&ktype->link);
 	downgrade_write(&key_types_sem);
 	key_gc_keytype(ktype);
+	pr_notice("Key type %s unregistered\n", ktype->name);
 	up_read(&key_types_sem);
 }
 EXPORT_SYMBOL(unregister_key_type);

--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -374,6 +374,37 @@ long keyctl_revoke_key(key_serial_t id)
 	return ret;
 }
+/*
+ * Invalidate a key.
+ *
+ * The key must be grant the caller Invalidate permission for this to work.
+ * The key and any links to the key will be automatically garbage collected
+ * immediately.
+ *
+ * If successful, 0 is returned.
+ */
+long keyctl_invalidate_key(key_serial_t id)
+{
+	key_ref_t key_ref;
+	long ret;
+	kenter("%d", id);
+	key_ref = lookup_user_key(id, 0, KEY_SEARCH);
+	if (IS_ERR(key_ref)) {
+		ret = PTR_ERR(key_ref);
+		goto error;
+	}
+	key_invalidate(key_ref_to_ptr(key_ref));
+	ret = 0;
+	key_ref_put(key_ref);
+error:
+	kleave(" = %ld", ret);
+	return ret;
+}
 /*
 * Clear the specified keyring, creating an empty process keyring if one of the
 * special keyring IDs is used.
@@ -1622,6 +1653,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			(unsigned) arg4,
 			(key_serial_t) arg5);
+	case KEYCTL_INVALIDATE:
+		return keyctl_invalidate_key((key_serial_t) arg2);
 	default:
 		return -EOPNOTSUPP;
 	}

--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -25,6 +25,15 @@
 		(keyring)->payload.subscriptions,			\
 		rwsem_is_locked((struct rw_semaphore *)&(keyring)->sem)))
+#define rcu_deref_link_locked(klist, index, keyring)			\
+	(rcu_dereference_protected(					\
+		(klist)->keys[index],					\
+		rwsem_is_locked((struct rw_semaphore *)&(keyring)->sem)))
+#define MAX_KEYRING_LINKS						\
+	min_t(size_t, USHRT_MAX - 1,					\
+	      ((PAGE_SIZE - sizeof(struct keyring_list)) / sizeof(struct key *)))
 #define KEY_LINK_FIXQUOTA 1UL
 /*
@@ -138,6 +147,11 @@ static int keyring_match(const struct key *keyring, const void *description)
 /*
 * Clean up a keyring when it is destroyed.  Unpublish its name if it had one
 * and dispose of its data.
+ *
+ * The garbage collector detects the final key_put(), removes the keyring from
+ * the serial number tree and then does RCU synchronisation before coming here,
+ * so we shouldn't need to worry about code poking around here with the RCU
+ * readlock held by this time.
 */
 static void keyring_destroy(struct key *keyring)
 {
@@ -154,11 +168,10 @@ static void keyring_destroy(struct key *keyring)
 		write_unlock(&keyring_name_lock);
 	}
-	klist = rcu_dereference_check(keyring->payload.subscriptions,
+	klist = rcu_access_pointer(keyring->payload.subscriptions);
-				      atomic_read(&keyring->usage) == 0);
 	if (klist) {
 		for (loop = klist->nkeys - 1; loop >= 0; loop--)
-			key_put(klist->keys[loop]);
+			key_put(rcu_access_pointer(klist->keys[loop]));
 		kfree(klist);
 	}
 }
@@ -214,7 +227,8 @@ static long keyring_read(const struct key *keyring,
 			ret = -EFAULT;
 			for (loop = 0; loop < klist->nkeys; loop++) {
-				key = klist->keys[loop];
+				key = rcu_deref_link_locked(klist, loop,
+							    keyring);
 				tmp = sizeof(key_serial_t);
 				if (tmp > buflen)
@@ -309,6 +323,8 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 			     bool no_state_check)
 {
 	struct {
+		/* Need a separate keylist pointer for RCU purposes */
+		struct key *keyring;
 		struct keyring_list *keylist;
 		int kix;
 	} stack[KEYRING_SEARCH_MAX_DEPTH];
@@ -366,13 +382,17 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 	/* otherwise, the top keyring must not be revoked, expired, or
 	 * negatively instantiated if we are to search it */
 	key_ref = ERR_PTR(-EAGAIN);
-	if (kflags & ((1 << KEY_FLAG_REVOKED) | (1 << KEY_FLAG_NEGATIVE)) ||
+	if (kflags & ((1 << KEY_FLAG_INVALIDATED) |
+		      (1 << KEY_FLAG_REVOKED) |
+		      (1 << KEY_FLAG_NEGATIVE)) ||
 	    (keyring->expiry && now.tv_sec >= keyring->expiry))
 		goto error_2;
 	/* start processing a new keyring */
 descend:
-	if (test_bit(KEY_FLAG_REVOKED, &keyring->flags))
+	kflags = keyring->flags;
+	if (kflags & ((1 << KEY_FLAG_INVALIDATED) |
+		      (1 << KEY_FLAG_REVOKED)))
 		goto not_this_keyring;
 	keylist = rcu_dereference(keyring->payload.subscriptions);
@@ -383,16 +403,17 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 	nkeys = keylist->nkeys;
 	smp_rmb();
 	for (kix = 0; kix < nkeys; kix++) {
-		key = keylist->keys[kix];
+		key = rcu_dereference(keylist->keys[kix]);
 		kflags = key->flags;
 		/* ignore keys not of this type */
 		if (key->type != type)
 			continue;
-		/* skip revoked keys and expired keys */
+		/* skip invalidated, revoked and expired keys */
 		if (!no_state_check) {
-			if (kflags & (1 << KEY_FLAG_REVOKED))
+			if (kflags & ((1 << KEY_FLAG_INVALIDATED) |
+				      (1 << KEY_FLAG_REVOKED)))
 				continue;
 			if (key->expiry && now.tv_sec >= key->expiry)
@@ -426,7 +447,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 	nkeys = keylist->nkeys;
 	smp_rmb();
 	for (; kix < nkeys; kix++) {
-		key = keylist->keys[kix];
+		key = rcu_dereference(keylist->keys[kix]);
 		if (key->type != &key_type_keyring)
 			continue;
@@ -441,6 +462,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 			continue;
 		/* stack the current position */
+		stack[sp].keyring = keyring;
 		stack[sp].keylist = keylist;
 		stack[sp].kix = kix;
 		sp++;
@@ -456,6 +478,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 	if (sp > 0) {
 		/* resume the processing of a keyring higher up in the tree */
 		sp--;
+		keyring = stack[sp].keyring;
 		keylist = stack[sp].keylist;
 		kix = stack[sp].kix + 1;
 		goto ascend;
@@ -467,6 +490,10 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 	/* we found a viable match */
 found:
 	atomic_inc(&key->usage);
+	key->last_used_at = now.tv_sec;
+	keyring->last_used_at = now.tv_sec;
+	while (sp > 0)
+		stack[--sp].keyring->last_used_at = now.tv_sec;
 	key_check(key);
 	key_ref = make_key_ref(key, possessed);
 error_2:
@@ -531,14 +558,14 @@ key_ref_t __keyring_search_one(key_ref_t keyring_ref,
 		nkeys = klist->nkeys;
 		smp_rmb();
 		for (loop = 0; loop < nkeys ; loop++) {
-			key = klist->keys[loop];
+			key = rcu_dereference(klist->keys[loop]);
 			if (key->type == ktype &&
 			    (!key->type->match ||
 			     key->type->match(key, description)) &&
 			    key_permission(make_key_ref(key, possessed),
 					   perm) == 0 &&
-			    !test_bit(KEY_FLAG_REVOKED, &key->flags)
+			    !(key->flags & ((1 << KEY_FLAG_INVALIDATED) |
+					    (1 << KEY_FLAG_REVOKED)))
 			    )
 				goto found;
 		}
@@ -549,6 +576,8 @@ key_ref_t __keyring_search_one(key_ref_t keyring_ref,
 found:
 	atomic_inc(&key->usage);
+	keyring->last_used_at = key->last_used_at =
+		current_kernel_time().tv_sec;
 	rcu_read_unlock();
 	return make_key_ref(key, possessed);
 }
@@ -602,6 +631,7 @@ struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
 			 * (ie. it has a zero usage count) */
 			if (!atomic_inc_not_zero(&keyring->usage))
 				continue;
+			keyring->last_used_at = current_kernel_time().tv_sec;
 			goto out;
 		}
 	}
@@ -654,7 +684,7 @@ static int keyring_detect_cycle(struct key *A, struct key *B)
 	nkeys = keylist->nkeys;
 	smp_rmb();
 	for (; kix < nkeys; kix++) {
-		key = keylist->keys[kix];
+		key = rcu_dereference(keylist->keys[kix]);
 		if (key == A)
 			goto cycle_detected;
@@ -711,7 +741,7 @@ static void keyring_unlink_rcu_disposal(struct rcu_head *rcu)
 		container_of(rcu, struct keyring_list, rcu);
 	if (klist->delkey != USHRT_MAX)
-		key_put(klist->keys[klist->delkey]);
+		key_put(rcu_access_pointer(klist->keys[klist->delkey]));
 	kfree(klist);
 }
@@ -725,8 +755,9 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	struct keyring_list *klist, *nklist;
 	unsigned long prealloc;
 	unsigned max;
+	time_t lowest_lru;
 	size_t size;
-	int loop, ret;
+	int loop, lru, ret;
 	kenter("%d,%s,%s,", key_serial(keyring), type->name, description);
@@ -747,31 +778,39 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	klist = rcu_dereference_locked_keyring(keyring);
 	/* see if there's a matching key we can displace */
+	lru = -1;
 	if (klist && klist->nkeys > 0) {
+		lowest_lru = TIME_T_MAX;
 		for (loop = klist->nkeys - 1; loop >= 0; loop--) {
-			if (klist->keys[loop]->type == type &&
+			struct key *key = rcu_deref_link_locked(klist, loop,
-			    strcmp(klist->keys[loop]->description,
+								keyring);
-				   description) == 0
+			if (key->type == type &&
-			    ) {
+			    strcmp(key->description, description) == 0) {
-				/* found a match - we'll replace this one with
+				/* Found a match - we'll replace the link with
-				 * the new key */
+				 * one to the new key.  We record the slot
-				size = sizeof(struct key *) * klist->maxkeys;
+				 * position.
-				size += sizeof(*klist);
+				 */
-				BUG_ON(size > PAGE_SIZE);
+				klist->delkey = loop;
+				prealloc = 0;
-				ret = -ENOMEM;
-				nklist = kmemdup(klist, size, GFP_KERNEL);
-				if (!nklist)
-					goto error_sem;
-				/* note replacement slot */
-				klist->delkey = nklist->delkey = loop;
-				prealloc = (unsigned long)nklist;
 				goto done;
 			}
+			if (key->last_used_at < lowest_lru) {
+				lowest_lru = key->last_used_at;
+				lru = loop;
+			}
 		}
 	}
+	/* If the keyring is full then do an LRU discard */
+	if (klist &&
+	    klist->nkeys == klist->maxkeys &&
+	    klist->maxkeys >= MAX_KEYRING_LINKS) {
+		kdebug("LRU discard %d\n", lru);
+		klist->delkey = lru;
+		prealloc = 0;
+		goto done;
+	}
 	/* check that we aren't going to overrun the user's quota */
 	ret = key_payload_reserve(keyring,
 				  keyring->datalen + KEYQUOTA_LINK_BYTES);
@@ -780,20 +819,19 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 	if (klist && klist->nkeys < klist->maxkeys) {
 		/* there's sufficient slack space to append directly */
-		nklist = NULL;
+		klist->delkey = klist->nkeys;
 		prealloc = KEY_LINK_FIXQUOTA;
 	} else {
 		/* grow the key list */
 		max = 4;
-		if (klist)
+		if (klist) {
 			max += klist->maxkeys;
+			if (max > MAX_KEYRING_LINKS)
+				max = MAX_KEYRING_LINKS;
+			BUG_ON(max <= klist->maxkeys);
+		}
-		ret = -ENFILE;
-		if (max > USHRT_MAX - 1)
-			goto error_quota;
 		size = sizeof(*klist) + sizeof(struct key *) * max;
-		if (size > PAGE_SIZE)
-			goto error_quota;
 		ret = -ENOMEM;
 		nklist = kmalloc(size, GFP_KERNEL);
@@ -813,10 +851,10 @@ int __key_link_begin(struct key *keyring, const struct key_type *type,
 		}
 		/* add the key into the new space */
-		nklist->keys[nklist->delkey] = NULL;
+		RCU_INIT_POINTER(nklist->keys[nklist->delkey], NULL);
+		prealloc = (unsigned long)nklist | KEY_LINK_FIXQUOTA;
 	}
-	prealloc = (unsigned long)nklist | KEY_LINK_FIXQUOTA;
 done:
 	*_prealloc = prealloc;
 	kleave(" = 0");
@@ -862,6 +900,7 @@ void __key_link(struct key *keyring, struct key *key,
 		unsigned long *_prealloc)
 {
 	struct keyring_list *klist, *nklist;
+	struct key *discard;
 	nklist = (struct keyring_list *)(*_prealloc & ~KEY_LINK_FIXQUOTA);
 	*_prealloc = 0;
@@ -871,14 +910,16 @@ void __key_link(struct key *keyring, struct key *key,
 	klist = rcu_dereference_locked_keyring(keyring);
 	atomic_inc(&key->usage);
+	keyring->last_used_at = key->last_used_at =
+		current_kernel_time().tv_sec;
 	/* there's a matching key we can displace or an empty slot in a newly
 	 * allocated list we can fill */
 	if (nklist) {
-		kdebug("replace %hu/%hu/%hu",
+		kdebug("reissue %hu/%hu/%hu",
 		       nklist->delkey, nklist->nkeys, nklist->maxkeys);
-		nklist->keys[nklist->delkey] = key;
+		RCU_INIT_POINTER(nklist->keys[nklist->delkey], key);
 		rcu_assign_pointer(keyring->payload.subscriptions, nklist);
@@ -889,9 +930,23 @@ void __key_link(struct key *keyring, struct key *key,
 			       klist->delkey, klist->nkeys, klist->maxkeys);
 			call_rcu(&klist->rcu, keyring_unlink_rcu_disposal);
 		}
+	} else if (klist->delkey < klist->nkeys) {
+		kdebug("replace %hu/%hu/%hu",
+		       klist->delkey, klist->nkeys, klist->maxkeys);
+		discard = rcu_dereference_protected(
+			klist->keys[klist->delkey],
+			rwsem_is_locked(&keyring->sem));
+		rcu_assign_pointer(klist->keys[klist->delkey], key);
+		/* The garbage collector will take care of RCU
+		 * synchronisation */
+		key_put(discard);
 	} else {
 		/* there's sufficient slack space to append directly */
-		klist->keys[klist->nkeys] = key;
+		kdebug("append %hu/%hu/%hu",
+		       klist->delkey, klist->nkeys, klist->maxkeys);
+		RCU_INIT_POINTER(klist->keys[klist->delkey], key);
 		smp_wmb();
 		klist->nkeys++;
 	}
@@ -998,7 +1053,7 @@ int key_unlink(struct key *keyring, struct key *key)
 	if (klist) {
 		/* search the keyring for the key */
 		for (loop = 0; loop < klist->nkeys; loop++)
-			if (klist->keys[loop] == key)
+			if (rcu_access_pointer(klist->keys[loop]) == key)
 				goto key_is_present;
 	}
@@ -1061,7 +1116,7 @@ static void keyring_clear_rcu_disposal(struct rcu_head *rcu)
 	klist = container_of(rcu, struct keyring_list, rcu);
 	for (loop = klist->nkeys - 1; loop >= 0; loop--)
-		key_put(klist->keys[loop]);
+		key_put(rcu_access_pointer(klist->keys[loop]));
 	kfree(klist);
 }
@@ -1127,15 +1182,6 @@ static void keyring_revoke(struct key *keyring)
 	}
 }
-/*
- * Determine whether a key is dead.
- */
-static bool key_is_dead(struct key *key, time_t limit)
-{
-	return test_bit(KEY_FLAG_DEAD, &key->flags) ||
-		(key->expiry > 0 && key->expiry <= limit);
-}
 /*
 * Collect garbage from the contents of a keyring, replacing the old list with
 * a new one with the pointers all shuffled down.
@@ -1161,7 +1207,8 @@ void keyring_gc(struct key *keyring, time_t limit)
 	/* work out how many subscriptions we're keeping */
 	keep = 0;
 	for (loop = klist->nkeys - 1; loop >= 0; loop--)
-		if (!key_is_dead(klist->keys[loop], limit))
+		if (!key_is_dead(rcu_deref_link_locked(klist, loop, keyring),
+				 limit))
 			keep++;
 	if (keep == klist->nkeys)
@@ -1182,11 +1229,11 @@ void keyring_gc(struct key *keyring, time_t limit)
 	 */
 	keep = 0;
 	for (loop = klist->nkeys - 1; loop >= 0; loop--) {
-		key = klist->keys[loop];
+		key = rcu_deref_link_locked(klist, loop, keyring);
 		if (!key_is_dead(key, limit)) {
 			if (keep >= max)
 				goto discard_new;
-			new->keys[keep++] = key_get(key);
+			RCU_INIT_POINTER(new->keys[keep++], key_get(key));
 		}
 	}
 	new->nkeys = keep;

--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -87,32 +87,29 @@ EXPORT_SYMBOL(key_task_permission);
 * key_validate - Validate a key.
 * @key: The key to be validated.
 *
- * Check that a key is valid, returning 0 if the key is okay, -EKEYREVOKED if
+ * Check that a key is valid, returning 0 if the key is okay, -ENOKEY if the
- * the key's type has been removed or if the key has been revoked or
+ * key is invalidated, -EKEYREVOKED if the key's type has been removed or if
- * -EKEYEXPIRED if the key has expired.
+ * the key has been revoked or -EKEYEXPIRED if the key has expired.
 */
-int key_validate(struct key *key)
+int key_validate(const struct key *key)
 {
-	struct timespec now;
+	unsigned long flags = key->flags;
-	int ret = 0;
+	if (flags & (1 << KEY_FLAG_INVALIDATED))
-	if (key) {
+		return -ENOKEY;
-		/* check it's still accessible */
-		ret = -EKEYREVOKED;
+	/* check it's still accessible */
-		if (test_bit(KEY_FLAG_REVOKED, &key->flags) ||
+	if (flags & ((1 << KEY_FLAG_REVOKED) |
-		    test_bit(KEY_FLAG_DEAD, &key->flags))
+		     (1 << KEY_FLAG_DEAD)))
-			goto error;
+		return -EKEYREVOKED;
-		/* check it hasn't expired */
+	/* check it hasn't expired */
-		ret = 0;
+	if (key->expiry) {
-		if (key->expiry) {
+		struct timespec now = current_kernel_time();
-			now = current_kernel_time();
+		if (now.tv_sec >= key->expiry)
-			if (now.tv_sec >= key->expiry)
+			return -EKEYEXPIRED;
-				ret = -EKEYEXPIRED;
-		}
 	}
-error:
+	return 0;
-	return ret;
 }
 EXPORT_SYMBOL(key_validate);
--- a/security/keys/proc.c
+++ b/security/keys/proc.c
@@ -242,7 +242,7 @@ static int proc_keys_show(struct seq_file *m, void *v)
 #define showflag(KEY, LETTER, FLAG) \
 	(test_bit(FLAG,	&(KEY)->flags) ? LETTER : '-')
-	seq_printf(m, "%08x %c%c%c%c%c%c %5d %4s %08x %5d %5d %-9.9s ",
+	seq_printf(m, "%08x %c%c%c%c%c%c%c %5d %4s %08x %5d %5d %-9.9s ",
 		   key->serial,
 		   showflag(key, 'I', KEY_FLAG_INSTANTIATED),
 		   showflag(key, 'R', KEY_FLAG_REVOKED),
@@ -250,6 +250,7 @@ static int proc_keys_show(struct seq_file *m, void *v)
 		   showflag(key, 'Q', KEY_FLAG_IN_QUOTA),
 		   showflag(key, 'U', KEY_FLAG_USER_CONSTRUCT),
 		   showflag(key, 'N', KEY_FLAG_NEGATIVE),
+		   showflag(key, 'i', KEY_FLAG_INVALIDATED),
 		   atomic_read(&key->usage),
 		   xbuf,
 		   key->perm,

--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -732,6 +732,8 @@ key_ref_t lookup_user_key(key_serial_t id, unsigned long lflags,
 	if (ret < 0)
 		goto invalid_key;
+	key->last_used_at = current_kernel_time().tv_sec;
 error:
 	put_cred(cred);
 	return key_ref;

--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -213,12 +213,15 @@ static void dump_common_audit_data(struct audit_buffer *ab,
 {
 	struct task_struct *tsk = current;
-	if (a->tsk)
+	/*
-		tsk = a->tsk;
+	 * To keep stack sizes in check force programers to notice if they
-	if (tsk && tsk->pid) {
+	 * start making this union too large!  See struct lsm_network_audit
-		audit_log_format(ab, " pid=%d comm=", tsk->pid);
+	 * as an example of how to deal with large data.
-		audit_log_untrustedstring(ab, tsk->comm);
+	 */
-	}
+	BUILD_BUG_ON(sizeof(a->u) > sizeof(void *)*2);
+	audit_log_format(ab, " pid=%d comm=", tsk->pid);
+	audit_log_untrustedstring(ab, tsk->comm);
 	switch (a->type) {
 	case LSM_AUDIT_DATA_NONE:

--- a/security/security.c
+++ b/security/security.c
@@ -701,11 +701,11 @@ int security_file_receive(struct file *file)
 	return security_ops->file_receive(file);
 }
-int security_dentry_open(struct file *file, const struct cred *cred)
+int security_file_open(struct file *file, const struct cred *cred)
 {
 	int ret;
-	ret = security_ops->dentry_open(file, cred);
+	ret = security_ops->file_open(file, cred);
 	if (ret)
 		return ret;

--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -65,14 +65,8 @@ struct avc_cache {
 };
 struct avc_callback_node {
-	int (*callback) (u32 event, u32 ssid, u32 tsid,
+	int (*callback) (u32 event);
-			 u16 tclass, u32 perms,
-			 u32 *out_retained);
 	u32 events;
-	u32 ssid;
-	u32 tsid;
-	u16 tclass;
-	u32 perms;
 	struct avc_callback_node *next;
 };
@@ -436,9 +430,9 @@ static void avc_audit_pre_callback(struct audit_buffer *ab, void *a)
 {
 	struct common_audit_data *ad = a;
 	audit_log_format(ab, "avc:  %s ",
-			 ad->selinux_audit_data->slad->denied ? "denied" : "granted");
+			 ad->selinux_audit_data->denied ? "denied" : "granted");
-	avc_dump_av(ab, ad->selinux_audit_data->slad->tclass,
+	avc_dump_av(ab, ad->selinux_audit_data->tclass,
-			ad->selinux_audit_data->slad->audited);
+			ad->selinux_audit_data->audited);
 	audit_log_format(ab, " for ");
 }
@@ -452,25 +446,23 @@ static void avc_audit_post_callback(struct audit_buffer *ab, void *a)
 {
 	struct common_audit_data *ad = a;
 	audit_log_format(ab, " ");
-	avc_dump_query(ab, ad->selinux_audit_data->slad->ssid,
+	avc_dump_query(ab, ad->selinux_audit_data->ssid,
-			   ad->selinux_audit_data->slad->tsid,
+			   ad->selinux_audit_data->tsid,
-			   ad->selinux_audit_data->slad->tclass);
+			   ad->selinux_audit_data->tclass);
 }
 /* This is the slow part of avc audit with big stack footprint */
-static noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
+noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 		u32 requested, u32 audited, u32 denied,
 		struct common_audit_data *a,
 		unsigned flags)
 {
 	struct common_audit_data stack_data;
-	struct selinux_audit_data sad = {0,};
+	struct selinux_audit_data sad;
-	struct selinux_late_audit_data slad;
 	if (!a) {
 		a = &stack_data;
-		COMMON_AUDIT_DATA_INIT(a, NONE);
+		a->type = LSM_AUDIT_DATA_NONE;
-		a->selinux_audit_data = &sad;
 	}
 	/*
@@ -484,104 +476,34 @@ static noinline int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
 	    (flags & MAY_NOT_BLOCK))
 		return -ECHILD;
-	slad.tclass = tclass;
+	sad.tclass = tclass;
-	slad.requested = requested;
+	sad.requested = requested;
-	slad.ssid = ssid;
+	sad.ssid = ssid;
-	slad.tsid = tsid;
+	sad.tsid = tsid;
-	slad.audited = audited;
+	sad.audited = audited;
-	slad.denied = denied;
+	sad.denied = denied;
+	a->selinux_audit_data = &sad;
-	a->selinux_audit_data->slad = &slad;
 	common_lsm_audit(a, avc_audit_pre_callback, avc_audit_post_callback);
 	return 0;
 }
-/**
- * avc_audit - Audit the granting or denial of permissions.
- * @ssid: source security identifier
- * @tsid: target security identifier
- * @tclass: target security class
- * @requested: requested permissions
- * @avd: access vector decisions
- * @result: result from avc_has_perm_noaudit
- * @a:  auxiliary audit data
- * @flags: VFS walk flags
- *
- * Audit the granting or denial of permissions in accordance
- * with the policy.  This function is typically called by
- * avc_has_perm() after a permission check, but can also be
- * called directly by callers who use avc_has_perm_noaudit()
- * in order to separate the permission check from the auditing.
- * For example, this separation is useful when the permission check must
- * be performed under a lock, to allow the lock to be released
- * before calling the auditing code.
- */
-inline int avc_audit(u32 ssid, u32 tsid,
-	       u16 tclass, u32 requested,
-	       struct av_decision *avd, int result, struct common_audit_data *a,
-	       unsigned flags)
-{
-	u32 denied, audited;
-	denied = requested & ~avd->allowed;
-	if (unlikely(denied)) {
-		audited = denied & avd->auditdeny;
-		/*
-		 * a->selinux_audit_data->auditdeny is TRICKY!  Setting a bit in
-		 * this field means that ANY denials should NOT be audited if
-		 * the policy contains an explicit dontaudit rule for that
-		 * permission.  Take notice that this is unrelated to the
-		 * actual permissions that were denied.  As an example lets
-		 * assume:
-		 *
-		 * denied == READ
-		 * avd.auditdeny & ACCESS == 0 (not set means explicit rule)
-		 * selinux_audit_data->auditdeny & ACCESS == 1
-		 *
-		 * We will NOT audit the denial even though the denied
-		 * permission was READ and the auditdeny checks were for
-		 * ACCESS
-		 */
-		if (a &&
-		    a->selinux_audit_data->auditdeny &&
-		    !(a->selinux_audit_data->auditdeny & avd->auditdeny))
-			audited = 0;
-	} else if (result)
-		audited = denied = requested;
-	else
-		audited = requested & avd->auditallow;
-	if (likely(!audited))
-		return 0;
-	return slow_avc_audit(ssid, tsid, tclass,
-		requested, audited, denied,
-		a, flags);
-}
 /**
 * avc_add_callback - Register a callback for security events.
 * @callback: callback function
 * @events: security events
- * @ssid: source security identifier or %SECSID_WILD
- * @tsid: target security identifier or %SECSID_WILD
- * @tclass: target security class
- * @perms: permissions
 *
- * Register a callback function for events in the set @events
+ * Register a callback function for events in the set @events.
- * related to the SID pair (@ssid, @tsid) 
+ * Returns %0 on success or -%ENOMEM if insufficient memory
- * and the permissions @perms, interpreting
+ * exists to add the callback.
- * @perms based on @tclass.  Returns %0 on success or
- * -%ENOMEM if insufficient memory exists to add the callback.
 */
-int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
+int __init avc_add_callback(int (*callback)(u32 event), u32 events)
-				     u16 tclass, u32 perms,
-				     u32 *out_retained),
-		     u32 events, u32 ssid, u32 tsid,
-		     u16 tclass, u32 perms)
 {
 	struct avc_callback_node *c;
 	int rc = 0;
-	c = kmalloc(sizeof(*c), GFP_ATOMIC);
+	c = kmalloc(sizeof(*c), GFP_KERNEL);
 	if (!c) {
 		rc = -ENOMEM;
 		goto out;
@@ -589,9 +511,6 @@ int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
 	c->callback = callback;
 	c->events = events;
-	c->ssid = ssid;
-	c->tsid = tsid;
-	c->perms = perms;
 	c->next = avc_callbacks;
 	avc_callbacks = c;
 out:
@@ -731,8 +650,7 @@ int avc_ss_reset(u32 seqno)
 	for (c = avc_callbacks; c; c = c->next) {
 		if (c->events & AVC_CALLBACK_RESET) {
-			tmprc = c->callback(AVC_CALLBACK_RESET,
+			tmprc = c->callback(AVC_CALLBACK_RESET);
-					    0, 0, 0, 0, NULL);
 			/* save the first error encountered for the return
 			   value and continue processing the callbacks */
 			if (!rc)

--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -49,7 +49,7 @@ struct avc_cache_stats {
 /*
 * We only need this data after we have decided to send an audit message.
 */
-struct selinux_late_audit_data {
+struct selinux_audit_data {
 	u32 ssid;
 	u32 tsid;
 	u16 tclass;
@@ -59,29 +59,87 @@ struct selinux_late_audit_data {
 	int result;
 };
-/*
- * We collect this at the beginning or during an selinux security operation
- */
-struct selinux_audit_data {
-	/*
-	 * auditdeny is a bit tricky and unintuitive.  See the
-	 * comments in avc.c for it's meaning and usage.
-	 */
-	u32 auditdeny;
-	struct selinux_late_audit_data *slad;
-};
 /*
 * AVC operations
 */
 void __init avc_init(void);
-int avc_audit(u32 ssid, u32 tsid,
+static inline u32 avc_audit_required(u32 requested,
-	       u16 tclass, u32 requested,
+			      struct av_decision *avd,
-	       struct av_decision *avd,
+			      int result,
-	       int result,
+			      u32 auditdeny,
-	      struct common_audit_data *a, unsigned flags);
+			      u32 *deniedp)
+{
+	u32 denied, audited;
+	denied = requested & ~avd->allowed;
+	if (unlikely(denied)) {
+		audited = denied & avd->auditdeny;
+		/*
+		 * auditdeny is TRICKY!  Setting a bit in
+		 * this field means that ANY denials should NOT be audited if
+		 * the policy contains an explicit dontaudit rule for that
+		 * permission.  Take notice that this is unrelated to the
+		 * actual permissions that were denied.  As an example lets
+		 * assume:
+		 *
+		 * denied == READ
+		 * avd.auditdeny & ACCESS == 0 (not set means explicit rule)
+		 * auditdeny & ACCESS == 1
+		 *
+		 * We will NOT audit the denial even though the denied
+		 * permission was READ and the auditdeny checks were for
+		 * ACCESS
+		 */
+		if (auditdeny && !(auditdeny & avd->auditdeny))
+			audited = 0;
+	} else if (result)
+		audited = denied = requested;
+	else
+		audited = requested & avd->auditallow;
+	*deniedp = denied;
+	return audited;
+}
+int slow_avc_audit(u32 ssid, u32 tsid, u16 tclass,
+		   u32 requested, u32 audited, u32 denied,
+		   struct common_audit_data *a,
+		   unsigned flags);
+/**
+ * avc_audit - Audit the granting or denial of permissions.
+ * @ssid: source security identifier
+ * @tsid: target security identifier
+ * @tclass: target security class
+ * @requested: requested permissions
+ * @avd: access vector decisions
+ * @result: result from avc_has_perm_noaudit
+ * @a:  auxiliary audit data
+ * @flags: VFS walk flags
+ *
+ * Audit the granting or denial of permissions in accordance
+ * with the policy.  This function is typically called by
+ * avc_has_perm() after a permission check, but can also be
+ * called directly by callers who use avc_has_perm_noaudit()
+ * in order to separate the permission check from the auditing.
+ * For example, this separation is useful when the permission check must
+ * be performed under a lock, to allow the lock to be released
+ * before calling the auditing code.
+ */
+static inline int avc_audit(u32 ssid, u32 tsid,
+			    u16 tclass, u32 requested,
+			    struct av_decision *avd,
+			    int result,
+			    struct common_audit_data *a, unsigned flags)
+{
+	u32 audited, denied;
+	audited = avc_audit_required(requested, avd, result, 0, &denied);
+	if (likely(!audited))
+		return 0;
+	return slow_avc_audit(ssid, tsid, tclass,
+			      requested, audited, denied,
+			      a, flags);
+}
 #define AVC_STRICT 1 /* Ignore permissive mode. */
 int avc_has_perm_noaudit(u32 ssid, u32 tsid,
@@ -112,11 +170,7 @@ u32 avc_policy_seqno(void);
 #define AVC_CALLBACK_AUDITDENY_ENABLE	64
 #define AVC_CALLBACK_AUDITDENY_DISABLE	128
-int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
+int avc_add_callback(int (*callback)(u32 event), u32 events);
-				     u16 tclass, u32 perms,
-				     u32 *out_retained),
-		     u32 events, u32 ssid, u32 tsid,
-		     u16 tclass, u32 perms);
 /* Exported to selinuxfs */
 int avc_get_hash_stats(char *page);

--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -31,13 +31,15 @@
 #define POLICYDB_VERSION_BOUNDARY	24
 #define POLICYDB_VERSION_FILENAME_TRANS	25
 #define POLICYDB_VERSION_ROLETRANS	26
+#define POLICYDB_VERSION_NEW_OBJECT_DEFAULTS	27
+#define POLICYDB_VERSION_DEFAULT_TYPE	28
 /* Range of policy versions we understand*/
 #define POLICYDB_VERSION_MIN   POLICYDB_VERSION_BASE
 #ifdef CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX
 #define POLICYDB_VERSION_MAX	CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX_VALUE
 #else
-#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_ROLETRANS
+#define POLICYDB_VERSION_MAX	POLICYDB_VERSION_DEFAULT_TYPE
 #endif
 /* Mask for just the mount related flags */

--- a/security/selinux/netif.c
+++ b/security/selinux/netif.c
@@ -252,8 +252,7 @@ static void sel_netif_flush(void)
 	spin_unlock_bh(&sel_netif_lock);
 }
-static int sel_netif_avc_callback(u32 event, u32 ssid, u32 tsid,
+static int sel_netif_avc_callback(u32 event)
-				  u16 class, u32 perms, u32 *retained)
 {
 	if (event == AVC_CALLBACK_RESET) {
 		sel_netif_flush();
@@ -292,8 +291,7 @@ static __init int sel_netif_init(void)
 	register_netdevice_notifier(&sel_netif_netdev_notifier);
-	err = avc_add_callback(sel_netif_avc_callback, AVC_CALLBACK_RESET,
+	err = avc_add_callback(sel_netif_avc_callback, AVC_CALLBACK_RESET);
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
 	if (err)
 		panic("avc_add_callback() failed, error %d\n", err);

--- a/security/selinux/netnode.c
+++ b/security/selinux/netnode.c
@@ -297,8 +297,7 @@ static void sel_netnode_flush(void)
 	spin_unlock_bh(&sel_netnode_lock);
 }
-static int sel_netnode_avc_callback(u32 event, u32 ssid, u32 tsid,
+static int sel_netnode_avc_callback(u32 event)
-				    u16 class, u32 perms, u32 *retained)
 {
 	if (event == AVC_CALLBACK_RESET) {
 		sel_netnode_flush();
@@ -320,8 +319,7 @@ static __init int sel_netnode_init(void)
 		sel_netnode_hash[iter].size = 0;
 	}
-	ret = avc_add_callback(sel_netnode_avc_callback, AVC_CALLBACK_RESET,
+	ret = avc_add_callback(sel_netnode_avc_callback, AVC_CALLBACK_RESET);
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
 	if (ret != 0)
 		panic("avc_add_callback() failed, error %d\n", ret);

--- a/security/selinux/netport.c
+++ b/security/selinux/netport.c
@@ -234,8 +234,7 @@ static void sel_netport_flush(void)
 	spin_unlock_bh(&sel_netport_lock);
 }
-static int sel_netport_avc_callback(u32 event, u32 ssid, u32 tsid,
+static int sel_netport_avc_callback(u32 event)
-				    u16 class, u32 perms, u32 *retained)
 {
 	if (event == AVC_CALLBACK_RESET) {
 		sel_netport_flush();
@@ -257,8 +256,7 @@ static __init int sel_netport_init(void)
 		sel_netport_hash[iter].size = 0;
 	}
-	ret = avc_add_callback(sel_netport_avc_callback, AVC_CALLBACK_RESET,
+	ret = avc_add_callback(sel_netport_avc_callback, AVC_CALLBACK_RESET);
-			       SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0);
 	if (ret != 0)
 		panic("avc_add_callback() failed, error %d\n", ret);

--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -496,6 +496,7 @@ static const struct file_operations sel_policy_ops = {
 	.read		= sel_read_policy,
 	.mmap		= sel_mmap_policy,
 	.release	= sel_release_policy,
+	.llseek		= generic_file_llseek,
 };
 static ssize_t sel_write_load(struct file *file, const char __user *buf,
@@ -1232,6 +1233,7 @@ static int sel_make_bools(void)
 		kfree(bool_pending_names[i]);
 	kfree(bool_pending_names);
 	kfree(bool_pending_values);
+	bool_num = 0;
 	bool_pending_names = NULL;
 	bool_pending_values = NULL;
@@ -1532,11 +1534,6 @@ static int sel_make_initcon_files(struct dentry *dir)
 	return 0;
 }
-static inline unsigned int sel_div(unsigned long a, unsigned long b)
-{
-	return a / b - (a % b < 0);
-}
 static inline unsigned long sel_class_to_ino(u16 class)
 {
 	return (class * (SEL_VEC_MAX + 1)) | SEL_CLASS_INO_OFFSET;
@@ -1544,7 +1541,7 @@ static inline unsigned long sel_class_to_ino(u16 class)
 static inline u16 sel_ino_to_class(unsigned long ino)
 {
-	return sel_div(ino & SEL_INO_MASK, SEL_VEC_MAX + 1);
+	return (ino & SEL_INO_MASK) / (SEL_VEC_MAX + 1);
 }
 static inline unsigned long sel_perm_to_ino(u16 class, u32 perm)
@@ -1831,7 +1828,7 @@ static int sel_fill_super(struct super_block *sb, void *data, int silent)
 		[SEL_REJECT_UNKNOWN] = {"reject_unknown", &sel_handle_unknown_ops, S_IRUGO},
 		[SEL_DENY_UNKNOWN] = {"deny_unknown", &sel_handle_unknown_ops, S_IRUGO},
 		[SEL_STATUS] = {"status", &sel_handle_status_ops, S_IRUGO},
-		[SEL_POLICY] = {"policy", &sel_policy_ops, S_IRUSR},
+		[SEL_POLICY] = {"policy", &sel_policy_ops, S_IRUGO},
 		/* last one */ {""}
 	};
 	ret = simple_fill_super(sb, SELINUX_MAGIC, selinux_files);

--- a/security/selinux/ss/context.h
+++ b/security/selinux/ss/context.h
@@ -74,6 +74,26 @@ static inline int mls_context_cpy_low(struct context *dst, struct context *src)
 	return rc;
 }
+/*
+ * Sets both levels in the MLS range of 'dst' to the high level of 'src'.
+ */
+static inline int mls_context_cpy_high(struct context *dst, struct context *src)
+{
+	int rc;
+	dst->range.level[0].sens = src->range.level[1].sens;
+	rc = ebitmap_cpy(&dst->range.level[0].cat, &src->range.level[1].cat);
+	if (rc)
+		goto out;
+	dst->range.level[1].sens = src->range.level[1].sens;
+	rc = ebitmap_cpy(&dst->range.level[1].cat, &src->range.level[1].cat);
+	if (rc)
+		ebitmap_destroy(&dst->range.level[0].cat);
+out:
+	return rc;
+}
 static inline int mls_context_cmp(struct context *c1, struct context *c2)
 {
 	return ((c1->range.level[0].sens == c2->range.level[0].sens) &&

--- a/security/selinux/ss/mls.c
+++ b/security/selinux/ss/mls.c
@@ -517,6 +517,8 @@ int mls_compute_sid(struct context *scontext,
 {
 	struct range_trans rtr;
 	struct mls_range *r;
+	struct class_datum *cladatum;
+	int default_range = 0;
 	if (!policydb.mls_enabled)
 		return 0;
@@ -530,6 +532,28 @@ int mls_compute_sid(struct context *scontext,
 		r = hashtab_search(policydb.range_tr, &rtr);
 		if (r)
 			return mls_range_set(newcontext, r);
+		if (tclass && tclass <= policydb.p_classes.nprim) {
+			cladatum = policydb.class_val_to_struct[tclass - 1];
+			if (cladatum)
+				default_range = cladatum->default_range;
+		}
+		switch (default_range) {
+		case DEFAULT_SOURCE_LOW:
+			return mls_context_cpy_low(newcontext, scontext);
+		case DEFAULT_SOURCE_HIGH:
+			return mls_context_cpy_high(newcontext, scontext);
+		case DEFAULT_SOURCE_LOW_HIGH:
+			return mls_context_cpy(newcontext, scontext);
+		case DEFAULT_TARGET_LOW:
+			return mls_context_cpy_low(newcontext, tcontext);
+		case DEFAULT_TARGET_HIGH:
+			return mls_context_cpy_high(newcontext, tcontext);
+		case DEFAULT_TARGET_LOW_HIGH:
+			return mls_context_cpy(newcontext, tcontext);
+		}
 		/* Fallthrough */
 	case AVTAB_CHANGE:
 		if ((tclass == policydb.process_class) || (sock == true))

--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
--- a/security/tomoyo/common.c
+++ b/security/tomoyo/common.c
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -860,7 +860,6 @@ struct tomoyo_aggregator {
 /* Structure for policy manager. */
 struct tomoyo_manager {
 	struct tomoyo_acl_head head;
-	bool is_domain;  /* True if manager is a domainname. */
 	/* A path to program or a domainname. */
 	const struct tomoyo_path_info *manager;
 };

--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c