- 11 5月, 2012 3 次提交
-
-
由 David Howells 提交于
Add support for invalidating a key - which renders it immediately invisible to further searches and causes the garbage collector to immediately wake up, remove it from keyrings and then destroy it when it's no longer referenced. It's better not to do this with keyctl_revoke() as that marks the key to start returning -EKEYREVOKED to searches when what is actually desired is to have the key refetched. To invalidate a key the caller must be granted SEARCH permission by the key. This may be too strict. It may be better to also permit invalidation if the caller has any of READ, WRITE or SETATTR permission. The primary use for this is to evict keys that are cached in special keyrings, such as the DNS resolver or an ID mapper. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Do an LRU discard in keyrings that are full rather than returning ENFILE. To perform this, a time_t is added to the key struct and updated by the creation of a link to a key and by a key being found as the result of a search. At the completion of a successful search, the keyrings in the path between the root of the search and the first found link to it also have their last-used times updated. Note that discarding a link to a key from a keyring does not necessarily destroy the key as there may be references held by other places. An alternate discard method that might suffice is to perform FIFO discard from the keyring, using the spare 2-byte hole in the keylist header as the index of the next link to be discarded. This is useful when using a keyring as a cache for DNS results or foreign filesystem IDs. This can be tested by the following. As root do: echo 1000 >/proc/sys/kernel/keys/root_maxkeys kr=`keyctl newring foo @s` for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done Without this patch ENFILE should be reported when the keyring fills up. With this patch, the keyring discards keys in an LRU fashion. Note that the stored LRU time has a granularity of 1s. After doing this, /proc/key-users can be observed and should show that most of the 2000 keys have been discarded: [root@andromeda ~]# cat /proc/key-users 0: 517 516/516 513/1000 5249/20000 The "513/1000" here is the number of quota-accounted keys present for this user out of the maximum permitted. In /proc/keys, the keyring shows the number of keys it has and the number of slots it has allocated: [root@andromeda ~]# grep foo /proc/keys 200c64c4 I--Q-- 1 perm 3b3f0000 0 0 keyring foo: 509/509 The maximum is (PAGE_SIZE - header) / key pointer size. That's typically 509 on a 64-bit system and 1020 on a 32-bit system. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Make the keys garbage collector invoke synchronize_rcu() prior to destroying keys with a zero usage count. This means that a key can be examined under the RCU read lock in the safe knowledge that it won't get deallocated until after the lock is released - even if its usage count becomes zero whilst we're looking at it. This is useful in keyring search vs key link. Consider a keyring containing a link to a key. That link can be replaced in-place in the keyring without requiring an RCU copy-and-replace on the keyring contents without breaking a search underway on that keyring when the displaced key is released, provided the key is actually destroyed only after the RCU read lock held by the search algorithm is released. This permits __key_link() to replace a key without having to reallocate the key payload. A key gets replaced if a new key being linked into a keyring has the same type and description. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com>
-
- 30 4月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: NMichael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org # needed for systemd/autofs interaction fix Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 4月, 2012 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warning in spi.h (copy/paste): Warning(include/linux/spi/spi.h:365): No description found for parameter 'unprepare_transfer_hardware' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 27 4月, 2012 1 次提交
-
-
由 Robert Jarzmik 提交于
In 3.3, gpio wakeup setting was broken. The call enable_irq_wake() didn't set up the PXA gpio registers (PWER, ...) anymore. Fix it at least for pxa27x. The driver doesn't seem to be used in pxa25x (weird ...), and the fix doesn't extend to pxa3xx and pxa95x (which don't have a gpio_set_wake() available). Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: NHaojian Zhuang <haojian.zhuang@gmail.com>
-
- 26 4月, 2012 1 次提交
-
-
由 Ying Han 提交于
The "pgsteal" stat is confusing because it counts both direct reclaim as well as background reclaim. However, we have "kswapd_steal" which also counts background reclaim value. This patch fixes it and also makes it match the existng "pgscan_" stats. Test: pgsteal_kswapd_dma32 447623 pgsteal_kswapd_normal 42272677 pgsteal_kswapd_movable 0 pgsteal_direct_dma32 2801 pgsteal_direct_normal 44353270 pgsteal_direct_movable 0 Signed-off-by: NYing Han <yinghan@google.com> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 4月, 2012 1 次提交
-
-
由 Alan Stern 提交于
This patch (as1545) fixes a problem affecting several ASUS computers: The machine crashes or corrupts memory when going into suspend if the ehci-hcd driver is bound to any controllers. Users have been forced to unbind or unload ehci-hcd before putting their systems to sleep. After extensive testing, it was determined that the machines don't like going into suspend when any EHCI controllers are in the PCI D3 power state. Presumably this is a firmware bug, but there's nothing we can do about it except to avoid putting the controllers in D3 during system sleep. The patch adds a new flag to indicate whether the problem is present, and avoids changing the controller's power state if the flag is set. Runtime suspend is unaffected; this matters only for system suspend. However as a side effect, the controller will not respond to remote wakeup requests while the system is asleep. Hence USB wakeup is not functional -- but of course, this is already true in the current state of affairs. This fixes Bugzilla #42728. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Tested-by: NAndrey Rahmatullin <wrar@wrar.name> Tested-by: NOleksij Rempel (fishor) <bug-track@fisher-privat.net> Cc: stable <stable@vger.kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 23 4月, 2012 3 次提交
-
-
由 Carlos Chinea 提交于
Remove custom hack and make use of the notifier chain interfaces for delivering events from the ports to their associated clients. Clients that want to receive port events need to register their callbacks using hsi_register_port_event(). The callbacks can be called in interrupt context. Use hsi_unregestier_port_event() to undo the registration. Signed-off-by: NCarlos Chinea <carlos.chinea@nokia.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NLinus Walleij <linus.walleij@linaro.org>
-
由 Carlos Chinea 提交于
Use the proper release mechanism for hsi_controller and hsi_ports structures. Free the structures through their associated device release callbacks. Signed-off-by: NCarlos Chinea <carlos.chinea@nokia.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NLinus Walleij <linus.walleij@linaro.org>
-
由 Benjamin Herrenschmidt 提交于
This is meant typically to allow a PIC driver's irq domain map() callback to establish sane defaults for the interrupt (and make sure that the HW and the irq_desc are in sync as far as the trigger is concerned). The irq core may not call the set_trigger callback if it thinks the trigger is already set to the right setting, so we need to ensure new descriptors are properly synchronized with the hardware. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 21 4月, 2012 6 次提交
-
-
由 Al Viro 提交于
it's always current->mm Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Trond Myklebust 提交于
The NFSv4 spec is ambiguous about whether or not it is permissible to reuse open owner names, so play it safe. This patch adds a timestamp to the state_owner structure, and combines that with the IDA based uniquifier. Fixes a regression whereby the Linux server returns NFS4ERR_BAD_SEQID. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Chuanxiao Dong 提交于
MMC bus is using legacy suspend/resume method, which is not compatible if runtime pm callbacks are used. In this scenario, MMC bus suspend/resume callbacks cannot be called when system entering S3. So change to use the new defined dev_pm_ops for system sleeping mode. Tested on AM335x Platform. Solves major issue/crash reported at http://www.mail-archive.com/linux-omap@vger.kernel.org/msg65425.htmlSigned-off-by: NChuanxiao Dong <chuanxiao.dong@intel.com> Tested-by: NHebbar, Gururaja <gururaja.hebbar@ti.com> Acked-by: NLinus Walleij <linus.walleij@linaro.org> Acked-by: NUlf Hansson <ulf.hansson@stericsson.com> Signed-off-by: NChris Ball <cjb@laptop.org>
-
由 Linus Torvalds 提交于
This continues the theme started with vm_brk() and vm_munmap(): vm_mmap() does the same thing as do_mmap(), but additionally does the required VM locking. This uninlines (and rewrites it to be clearer) do_mmap(), which sadly duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have to export our internal do_mmap_pgoff() function. Some day we hopefully don't have to export do_mmap() either, if all modular users can become the simpler vm_mmap() instead. We're actually very close to that already, with the notable exception of the (broken) use in i810, and a couple of stragglers in binfmt_elf. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Like the vm_brk() function, this is the same as "do_munmap()", except it does the VM locking for the caller. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
It does the same thing as "do_brk()", except it handles the VM locking too. It turns out that all external callers want that anyway, so we can make do_brk() static to just mm/mmap.c while at it. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 4月, 2012 1 次提交
-
-
由 Will Drewry 提交于
This change is inspired by https://lkml.org/lkml/2012/4/16/14 which fixes the build warnings for arches that don't support CONFIG_HAVE_ARCH_SECCOMP_FILTER. In particular, there is no requirement for the return value of secure_computing() to be checked unless the architecture supports seccomp filter. Instead of silencing the warnings with (void) a new static inline is added to encode the expected behavior in a compiler and human friendly way. v2: - cleans things up with a static inline - removes sfr's signed-off-by since it is a different approach v1: - matches sfr's original change Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 17 4月, 2012 3 次提交
-
-
由 Stephen Rothwell 提交于
Fixes this error message when CONFIG_SECCOMP is not set: arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter': arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value] Signed-off-by: NStephen Rothwell <sfr@ozlabs.au.ibm.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Paul Gortmaker 提交于
The combination of commit 1b1247dd "mfd: Add support for RICOH PMIC RC5T583" and commit 6ffc3270 "regulator: Add support for RICOH PMIC RC5T583 regulator" are causing the i386 allmodconfig builds to fail with this: ERROR: "rc5t583_update" [drivers/regulator/rc5t583-regulator.ko] undefined! ERROR: "rc5t583_set_bits" [drivers/regulator/rc5t583-regulator.ko] undefined! ERROR: "rc5t583_clear_bits" [drivers/regulator/rc5t583-regulator.ko] undefined! ERROR: "rc5t583_read" [drivers/regulator/rc5t583-regulator.ko] undefined! and this: ERROR: "rc5t583_ext_power_req_config" [drivers/regulator/rc5t583-regulator.ko] undefined! For the 1st four, make the simple ops static inline, instead of polluting the namespace with trivial exports. For the last one, add an EXPORT_SYMBOL. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Jeff Layton 提交于
The cld.h file contains the definition of the upcall format to talk with nfsdcld. When I added the file though, I neglected to add it to the headers-y target, so make headers_install wasn't installing it. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 16 4月, 2012 2 次提交
-
-
由 Peter Ujfalusi 提交于
Complete the separation of the twl6040 from the twl core since it is a separate chip, not part of the twl6030 PMIC. Make the needed Kconfig changes for the depending drivers at the same time to avoid breaking the kernel build (vibra, ASoC components). Signed-off-by: NPeter Ujfalusi <peter.ujfalusi@ti.com> Reviewed-by: NMark Brown <broonie@opensource.wolfsonicro.com> Acked-by: NTony Lindgren <tony@atomide.com> Acked-by: NDmitry Torokhov <dtor@mail.ru> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Daniel Lezcano 提交于
The ux500 default config enables the db5500 and the db8500. The incoming cpuidle driver uses the 'prcmu_enable_wakeups' and the 'prcmu_set_power_state' functions but these ones are defined but not implemented for the db5500, leading to an unresolved symbol error at link time. In order to compile, we have to disable the db5500 support which is not acceptable for the default config. I noticed there are also some other functions which are defined but not implemented. This patch fix this by removing the functions definitions and move out of the config section the empty functions which are normally used when the DB550 config is disabled. Only the functions which are not implemented are concerned by this modification. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Acked-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
- 14 4月, 2012 11 次提交
-
-
由 Lubos Lunak 提交于
GCC's NULL is actually __null, which allows detecting some questionable NULL usage and warn about it. Moreover each platform/compiler should have its own stddef.h anyway (which is different from linux/stddef.h). So there's no good reason to leak kernel's NULL to userspace and override what the compiler provides. Signed-off-by: NLuboš Luňák <l.lunak@suse.cz> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Drewry 提交于
This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP, and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE. When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit SECCOMP_RET_DATA mask of the BPF program return value will be passed as the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG. If the subordinate process is not using seccomp filter, then no system call notifications will occur even if the option is specified. If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE is returned, the system call will not be executed and an -ENOSYS errno will be returned to userspace. This change adds a dependency on the system call slow path. Any future efforts to use the system call fast path for seccomp filter will need to address this restriction. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: - rebase - comment fatal_signal check - acked-by - drop secure_computing_int comment v17: - ... v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu) [note PT_TRACE_MASK disappears in linux-next] v15: - add audit support for non-zero return codes - clean up style (indan@nul.nu) v14: - rebase/nochanges v13: - rebase on to 88ebdda6 (Brings back a change to ptrace.c and the masks.) v12: - rebase to linux-next - use ptrace_event and update arch/Kconfig to mention slow-path dependency - drop all tracehook changes and inclusion (oleg@redhat.com) v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator (indan@nul.nu) v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP v9: - n/a v8: - guarded PTRACE_SECCOMP use with an ifdef v7: - introduced Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Adds a new return value to seccomp filters that triggers a SIGSYS to be delivered with the new SYS_SECCOMP si_code. This allows in-process system call emulation, including just specifying an errno or cleanly dumping core, rather than just dying. Suggested-by: NMarkus Gutschke <markus@chromium.org> Suggested-by: NJulien Tinnes <jln@chromium.org> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: - acked-by, rebase - don't mention secure_computing_int() anymore v15: - use audit_seccomp/skip - pad out error spacing; clean up switch (indan@nul.nu) v14: - n/a v13: - rebase on to 88ebdda6 v12: - rebase on to linux-next v11: - clarify the comment (indan@nul.nu) - s/sigtrap/sigsys v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig note suggested-by (though original suggestion had other behaviors) v9: - changes to SIGILL v8: - clean up based on changes to dependent patches v7: - introduction Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
This change adds the SECCOMP_RET_ERRNO as a valid return value from a seccomp filter. Additionally, it makes the first use of the lower 16-bits for storing a filter-supplied errno. 16-bits is more than enough for the errno-base.h calls. Returning errors instead of immediately terminating processes that violate seccomp policy allow for broader use of this functionality for kernel attack surface reduction. For example, a linux container could maintain a whitelist of pre-existing system calls but drop all new ones with errnos. This would keep a logically static attack surface while providing errnos that may allow for graceful failure without the downside of do_exit() on a bad call. This change also changes the signature of __secure_computing. It appears the only direct caller is the arm entry code and it clobbers any possible return value (register) immediately. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: - fix up comments and rebase - fix bad var name which was fixed in later revs - remove _int() and just change the __secure_computing signature v16-v17: ... v15: - use audit_seccomp and add a skip label. (eparis@redhat.com) - clean up and pad out return codes (indan@nul.nu) v14: - no change/rebase v13: - rebase on to 88ebdda6 v12: - move to WARN_ON if filter is NULL (oleg@redhat.com, luto@mit.edu, keescook@chromium.org) - return immediately for filter==NULL (keescook@chromium.org) - change evaluation to only compare the ACTION so that layered errnos don't result in the lowest one being returned. (keeschook@chromium.org) v11: - check for NULL filter (keescook@chromium.org) v10: - change loaders to fn v9: - n/a v8: - update Kconfig to note new need for syscall_set_return_value. - reordered such that TRAP behavior follows on later. - made the for loop a little less indent-y v7: - introduced Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Kees Cook 提交于
This consolidates the seccomp filter error logging path and adds more details to the audit log. Signed-off-by: NWill Drewry <wad@chromium.org> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: make compat= permanent in the record v15: added a return code to the audit_seccomp path by wad@chromium.org (suggested by eparis@redhat.com) v*: original by keescook@chromium.org Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
[This patch depends on luto@mit.edu's no_new_privs patch: https://lkml.org/lkml/2012/1/30/264 The whole series including Andrew's patches can be found here: https://github.com/redpig/linux/tree/seccomp Complete diff here: https://github.com/redpig/linux/compare/1dc65fed...seccomp ] This patch adds support for seccomp mode 2. Mode 2 introduces the ability for unprivileged processes to install system call filtering policy expressed in terms of a Berkeley Packet Filter (BPF) program. This program will be evaluated in the kernel for each system call the task makes and computes a result based on data in the format of struct seccomp_data. A filter program may be installed by calling: struct sock_fprog fprog = { ... }; ... prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog); The return value of the filter program determines if the system call is allowed to proceed or denied. If the first filter program installed allows prctl(2) calls, then the above call may be made repeatedly by a task to further reduce its access to the kernel. All attached programs must be evaluated before a system call will be allowed to proceed. Filter programs will be inherited across fork/clone and execve. However, if the task attaching the filter is unprivileged (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This ensures that unprivileged tasks cannot attach filters that affect privileged tasks (e.g., setuid binary). There are a number of benefits to this approach. A few of which are as follows: - BPF has been exposed to userland for a long time - BPF optimization (and JIT'ing) are well understood - Userland already knows its ABI: system call numbers and desired arguments - No time-of-check-time-of-use vulnerable data accesses are possible. - system call arguments are loaded on access only to minimize copying required for system call policy decisions. Mode 2 support is restricted to architectures that enable HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on syscall_get_arguments(). The full desired scope of this feature will add a few minor additional requirements expressed later in this series. Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be the desired additional functionality. No architectures are enabled in this patch. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NIndan Zupancic <indan@nul.nu> Acked-by: NEric Paris <eparis@redhat.com> Reviewed-by: NKees Cook <keescook@chromium.org> v18: - rebase to v3.4-rc2 - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org) - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu) - add a comment for get_u32 regarding endianness (akpm@) - fix other typos, style mistakes (akpm@) - added acked-by v17: - properly guard seccomp filter needed headers (leann@ubuntu.com) - tighten return mask to 0x7fff0000 v16: - no change v15: - add a 4 instr penalty when counting a path to account for seccomp_filter size (indan@nul.nu) - drop the max insns to 256KB (indan@nul.nu) - return ENOMEM if the max insns limit has been hit (indan@nul.nu) - move IP checks after args (indan@nul.nu) - drop !user_filter check (indan@nul.nu) - only allow explicit bpf codes (indan@nul.nu) - exit_code -> exit_sig v14: - put/get_seccomp_filter takes struct task_struct (indan@nul.nu,keescook@chromium.org) - adds seccomp_chk_filter and drops general bpf_run/chk_filter user - add seccomp_bpf_load for use by net/core/filter.c - lower max per-process/per-hierarchy: 1MB - moved nnp/capability check prior to allocation (all of the above: indan@nul.nu) v13: - rebase on to 88ebdda6 v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com) - removed copy_seccomp (keescook@chromium.org,indan@nul.nu) - reworded the prctl_set_seccomp comment (indan@nul.nu) v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com) - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu) - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu) - pare down Kconfig doc reference. - extra comment clean up v10: - seccomp_data has changed again to be more aesthetically pleasing (hpa@zytor.com) - calling convention is noted in a new u32 field using syscall_get_arch. This allows for cross-calling convention tasks to use seccomp filters. (hpa@zytor.com) - lots of clean up (thanks, Indan!) v9: - n/a v8: - use bpf_chk_filter, bpf_run_filter. update load_fns - Lots of fixes courtesy of indan@nul.nu: -- fix up load behavior, compat fixups, and merge alloc code, -- renamed pc and dropped __packed, use bool compat. -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch dependencies v7: (massive overhaul thanks to Indan, others) - added CONFIG_HAVE_ARCH_SECCOMP_FILTER - merged into seccomp.c - minimal seccomp_filter.h - no config option (part of seccomp) - no new prctl - doesn't break seccomp on systems without asm/syscall.h (works but arg access always fails) - dropped seccomp_init_task, extra free functions, ... - dropped the no-asm/syscall.h code paths - merges with network sk_run_filter and sk_chk_filter v6: - fix memory leak on attach compat check failure - require no_new_privs || CAP_SYS_ADMIN prior to filter installation. (luto@mit.edu) - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com) - cleaned up Kconfig (amwang@redhat.com) - on block, note if the call was compat (so the # means something) v5: - uses syscall_get_arguments (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) - uses union-based arg storage with hi/lo struct to handle endianness. Compromises between the two alternate proposals to minimize extra arg shuffling and account for endianness assuming userspace uses offsetof(). (mcgrathr@chromium.org, indan@nul.nu) - update Kconfig description - add include/seccomp_filter.h and add its installation - (naive) on-demand syscall argument loading - drop seccomp_t (eparis@redhat.com) v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS - now uses current->no_new_privs (luto@mit.edu,torvalds@linux-foundation.com) - assign names to seccomp modes (rdunlap@xenotime.net) - fix style issues (rdunlap@xenotime.net) - reworded Kconfig entry (rdunlap@xenotime.net) v3: - macros to inline (oleg@redhat.com) - init_task behavior fixed (oleg@redhat.com) - drop creator entry and extra NULL check (oleg@redhat.com) - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) - adds tentative use of "always_unprivileged" as per torvalds@linux-foundation.org and luto@mit.edu v2: - (patch 2 only) Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Replaces the seccomp_t typedef with struct seccomp to match modern kernel style. Signed-off-by: NWill Drewry <wad@chromium.org> Reviewed-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NEric Paris <eparis@redhat.com> v18: rebase ... v14: rebase/nochanges v13: rebase on to 88ebdda6 v12: rebase on to linux-next v8-v11: no changes v7: struct seccomp_struct -> struct seccomp v6: original inclusion in this series. Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Any other users of bpf_*_filter that take a struct sock_fprog from userspace will need to be able to also accept a compat_sock_fprog if the arch supports compat calls. This change allows the existing compat_sock_fprog be shared. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NEric Paris <eparis@redhat.com> v18: tasered by the apostrophe police v14: rebase/nochanges v13: rebase on to 88ebdda6 v12: rebase on to linux-next v11: introduction Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Introduces a new BPF ancillary instruction that all LD calls will be mapped through when skb_run_filter() is being used for seccomp BPF. The rewriting will be done using a secondary chk_filter function that is run after skb_chk_filter. The code change is guarded by CONFIG_SECCOMP_FILTER which is added, along with the seccomp_bpf_load() function later in this series. This is based on http://lkml.org/lkml/2012/3/2/141Suggested-by: NIndan Zupancic <indan@nul.nu> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NEric Paris <eparis@redhat.com> v18: rebase ... v15: include seccomp.h explicitly for when seccomp_bpf_load exists. v14: First cut using a single additional instruction ... v13: made bpf functions generic. Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Andy Lutomirski 提交于
With this change, calling prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) disables privilege granting operations at execve-time. For example, a process will not be able to execute a setuid binary to change their uid or gid if this bit is set. The same is true for file capabilities. Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that LSMs respect the requested behavior. To determine if the NO_NEW_PRIVS bit is set, a task may call prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); It returns 1 if set and 0 if it is not set. If any of the arguments are non-zero, it will return -1 and set errno to -EINVAL. (PR_SET_NO_NEW_PRIVS behaves similarly.) This functionality is desired for the proposed seccomp filter patch series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the system call behavior for itself and its child tasks without being able to impact the behavior of a more privileged task. Another potential use is making certain privileged operations unprivileged. For example, chroot may be considered "safe" if it cannot affect privileged tasks. Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is set and AppArmor is in use. It is fixed in a subsequent patch. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> v18: updated change desc v17: using new define values as per 3.4 Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Michael S. Tsirkin 提交于
The skb struct ubuf_info callback gets passed struct ubuf_info itself, not the arg value as the field name and the function signature seem to imply. Rename the arg field to ctx to match usage, add documentation and change the callback argument type to make usage clear and to have compiler check correctness. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 4月, 2012 2 次提交
-
-
由 Mark Brown 提交于
The AMBA bus regulator support is being used to model on/off switches for power domains which isn't terribly idiomatic for modern kernels with the generic power domain code and creates integration problems on platforms which don't use regulators for their power domains as it's hard to tell the difference between a regulator that is needed but failed to be provided and one that isn't supposed to be there (though DT does make that easier). Platforms that wish to use the regulator API to manage their power domains can indirect via the power domain interface. This feature is only used with the vape supply of the db8500 PRCMU driver which supplies the UARTs and MMC controllers, none of which have support for managing vcore at runtime in mainline (only pl022 SPI controller does). Update that supply to have an always_on constraint until the power domain support for the system is updated so that it is enabled for these users, this is likely to have no impact on practical systems as probably at least one of these devices will be active and cause AMBA to hold the supply on anyway. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: NLinus Walleij <linus.walleij@linaro.org> Tested-by: NShawn Guo <shawn.guo@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Paul Gortmaker 提交于
Using IS_ENABLED() within C (vs. within CPP #if statements) in its current form requires us to actually define every possible bool/tristate Kconfig option twice (__enabled_* and __enabled_*_MODULE variants). This results in a huge autoconf.h file, on the order of 16k lines for a x86_64 defconfig. Fixing IS_ENABLED to be able to work on the smaller subset of just things that we really have defined is step one to fixing this. Which means it has to not choke when fed non-enabled options, such as: include/linux/netdevice.h:964:1: warning: "__enabled_CONFIG_FCOE_MODULE" is not defined [-Wundef] The original prototype of how to implement a C and preprocessor compatible way of doing this came from the Google+ user "comex ." in response to Linus' crowdsourcing challenge for a possible improvement on his earlier C specific solution: #define config_enabled(x) (__stringify(x)[0] == '1') In this implementation, I've chosen variable names that hopefully make how it works more understandable. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 4月, 2012 3 次提交
-
-
由 Miklos Szeredi 提交于
Use the ISO C standard compliant form instead of the gcc extension in the interface definition. Reported-by: NShachar Sharon <ssnail@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Grant Likely 提交于
This patch replaces the old global setting of irq_virq_count that is only used by the NOMAP mapping and instead uses a revmap_data property so that the maximum NOMAP allocation can be set per NOMAP irq_domain. There is exactly one user of irq_virq_count in-tree right now: PS3. Also, irq_virq_count is only useful for the NOMAP mapping. So, instead of having a single global irq_virq_count values, this change drops it entirely and added a max_irq argument to irq_domain_add_nomap(). That makes it a property of an individual nomap irq domain instead of a global system settting. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca> Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Milton Miller <miltonm@bga.com>
-
由 Alex Williamson 提交于
We've been adding new mappings, but not destroying old mappings. This can lead to a page leak as pages are pinned using get_user_pages, but only unpinned with put_page if they still exist in the memslots list on vm shutdown. A memslot that is destroyed while an iommu domain is enabled for the guest will therefore result in an elevated page reference count that is never cleared. Additionally, without this fix, the iommu is only programmed with the first translation for a gpa. This can result in peer-to-peer errors if a mapping is destroyed and replaced by a new mapping at the same gpa as the iommu will still be pointing to the original, pinned memory address. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 11 4月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Marc Merlin reported many order-1 allocations failures in TX path on its wireless setup, that dont make any sense with MTU=1500 network, and non SG capable hardware. After investigation, it turns out TCP uses sk_stream_alloc_skb() and used as a convention skb_tailroom(skb) to know how many bytes of data payload could be put in this skb (for non SG capable devices) Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER + sizeof(struct skb_shared_info) being above 2048) Later, mac80211 layer need to add some bytes at the tail of skb (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is available has to call pskb_expand_head() and request order-1 allocations. This patch changes sk_stream_alloc_skb() so that only sk->sk_prot->max_header bytes of headroom are reserved, and use a new skb field, avail_size to hold the data payload limit. This way, order-0 allocations done by TCP stack can leave more than 2 KB of tailroom and no more allocation is performed in mac80211 layer (or any layer needing some tailroom) avail_size is unioned with mark/dropcount, since mark will be set later in IP stack for output packets. Therefore, skb size is unchanged. Reported-by: NMarc MERLIN <marc@merlins.org> Tested-by: NMarc MERLIN <marc@merlins.org> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-