- 17 4月, 2012 1 次提交
-
-
由 Stephen Rothwell 提交于
Fixes this error message when CONFIG_SECCOMP is not set: arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter': arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value] Signed-off-by: NStephen Rothwell <sfr@ozlabs.au.ibm.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 14 4月, 2012 9 次提交
-
-
由 Will Drewry 提交于
This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP, and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE. When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit SECCOMP_RET_DATA mask of the BPF program return value will be passed as the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG. If the subordinate process is not using seccomp filter, then no system call notifications will occur even if the option is specified. If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE is returned, the system call will not be executed and an -ENOSYS errno will be returned to userspace. This change adds a dependency on the system call slow path. Any future efforts to use the system call fast path for seccomp filter will need to address this restriction. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: - rebase - comment fatal_signal check - acked-by - drop secure_computing_int comment v17: - ... v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu) [note PT_TRACE_MASK disappears in linux-next] v15: - add audit support for non-zero return codes - clean up style (indan@nul.nu) v14: - rebase/nochanges v13: - rebase on to 88ebdda6 (Brings back a change to ptrace.c and the masks.) v12: - rebase to linux-next - use ptrace_event and update arch/Kconfig to mention slow-path dependency - drop all tracehook changes and inclusion (oleg@redhat.com) v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator (indan@nul.nu) v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP v9: - n/a v8: - guarded PTRACE_SECCOMP use with an ifdef v7: - introduced Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Adds a new return value to seccomp filters that triggers a SIGSYS to be delivered with the new SYS_SECCOMP si_code. This allows in-process system call emulation, including just specifying an errno or cleanly dumping core, rather than just dying. Suggested-by: NMarkus Gutschke <markus@chromium.org> Suggested-by: NJulien Tinnes <jln@chromium.org> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: - acked-by, rebase - don't mention secure_computing_int() anymore v15: - use audit_seccomp/skip - pad out error spacing; clean up switch (indan@nul.nu) v14: - n/a v13: - rebase on to 88ebdda6 v12: - rebase on to linux-next v11: - clarify the comment (indan@nul.nu) - s/sigtrap/sigsys v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig note suggested-by (though original suggestion had other behaviors) v9: - changes to SIGILL v8: - clean up based on changes to dependent patches v7: - introduction Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
This change adds the SECCOMP_RET_ERRNO as a valid return value from a seccomp filter. Additionally, it makes the first use of the lower 16-bits for storing a filter-supplied errno. 16-bits is more than enough for the errno-base.h calls. Returning errors instead of immediately terminating processes that violate seccomp policy allow for broader use of this functionality for kernel attack surface reduction. For example, a linux container could maintain a whitelist of pre-existing system calls but drop all new ones with errnos. This would keep a logically static attack surface while providing errnos that may allow for graceful failure without the downside of do_exit() on a bad call. This change also changes the signature of __secure_computing. It appears the only direct caller is the arm entry code and it clobbers any possible return value (register) immediately. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: - fix up comments and rebase - fix bad var name which was fixed in later revs - remove _int() and just change the __secure_computing signature v16-v17: ... v15: - use audit_seccomp and add a skip label. (eparis@redhat.com) - clean up and pad out return codes (indan@nul.nu) v14: - no change/rebase v13: - rebase on to 88ebdda6 v12: - move to WARN_ON if filter is NULL (oleg@redhat.com, luto@mit.edu, keescook@chromium.org) - return immediately for filter==NULL (keescook@chromium.org) - change evaluation to only compare the ACTION so that layered errnos don't result in the lowest one being returned. (keeschook@chromium.org) v11: - check for NULL filter (keescook@chromium.org) v10: - change loaders to fn v9: - n/a v8: - update Kconfig to note new need for syscall_set_return_value. - reordered such that TRAP behavior follows on later. - made the for loop a little less indent-y v7: - introduced Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Kees Cook 提交于
This consolidates the seccomp filter error logging path and adds more details to the audit log. Signed-off-by: NWill Drewry <wad@chromium.org> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> v18: make compat= permanent in the record v15: added a return code to the audit_seccomp path by wad@chromium.org (suggested by eparis@redhat.com) v*: original by keescook@chromium.org Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
[This patch depends on luto@mit.edu's no_new_privs patch: https://lkml.org/lkml/2012/1/30/264 The whole series including Andrew's patches can be found here: https://github.com/redpig/linux/tree/seccomp Complete diff here: https://github.com/redpig/linux/compare/1dc65fed...seccomp ] This patch adds support for seccomp mode 2. Mode 2 introduces the ability for unprivileged processes to install system call filtering policy expressed in terms of a Berkeley Packet Filter (BPF) program. This program will be evaluated in the kernel for each system call the task makes and computes a result based on data in the format of struct seccomp_data. A filter program may be installed by calling: struct sock_fprog fprog = { ... }; ... prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog); The return value of the filter program determines if the system call is allowed to proceed or denied. If the first filter program installed allows prctl(2) calls, then the above call may be made repeatedly by a task to further reduce its access to the kernel. All attached programs must be evaluated before a system call will be allowed to proceed. Filter programs will be inherited across fork/clone and execve. However, if the task attaching the filter is unprivileged (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This ensures that unprivileged tasks cannot attach filters that affect privileged tasks (e.g., setuid binary). There are a number of benefits to this approach. A few of which are as follows: - BPF has been exposed to userland for a long time - BPF optimization (and JIT'ing) are well understood - Userland already knows its ABI: system call numbers and desired arguments - No time-of-check-time-of-use vulnerable data accesses are possible. - system call arguments are loaded on access only to minimize copying required for system call policy decisions. Mode 2 support is restricted to architectures that enable HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on syscall_get_arguments(). The full desired scope of this feature will add a few minor additional requirements expressed later in this series. Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be the desired additional functionality. No architectures are enabled in this patch. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NIndan Zupancic <indan@nul.nu> Acked-by: NEric Paris <eparis@redhat.com> Reviewed-by: NKees Cook <keescook@chromium.org> v18: - rebase to v3.4-rc2 - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org) - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu) - add a comment for get_u32 regarding endianness (akpm@) - fix other typos, style mistakes (akpm@) - added acked-by v17: - properly guard seccomp filter needed headers (leann@ubuntu.com) - tighten return mask to 0x7fff0000 v16: - no change v15: - add a 4 instr penalty when counting a path to account for seccomp_filter size (indan@nul.nu) - drop the max insns to 256KB (indan@nul.nu) - return ENOMEM if the max insns limit has been hit (indan@nul.nu) - move IP checks after args (indan@nul.nu) - drop !user_filter check (indan@nul.nu) - only allow explicit bpf codes (indan@nul.nu) - exit_code -> exit_sig v14: - put/get_seccomp_filter takes struct task_struct (indan@nul.nu,keescook@chromium.org) - adds seccomp_chk_filter and drops general bpf_run/chk_filter user - add seccomp_bpf_load for use by net/core/filter.c - lower max per-process/per-hierarchy: 1MB - moved nnp/capability check prior to allocation (all of the above: indan@nul.nu) v13: - rebase on to 88ebdda6 v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com) - removed copy_seccomp (keescook@chromium.org,indan@nul.nu) - reworded the prctl_set_seccomp comment (indan@nul.nu) v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com) - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu) - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu) - pare down Kconfig doc reference. - extra comment clean up v10: - seccomp_data has changed again to be more aesthetically pleasing (hpa@zytor.com) - calling convention is noted in a new u32 field using syscall_get_arch. This allows for cross-calling convention tasks to use seccomp filters. (hpa@zytor.com) - lots of clean up (thanks, Indan!) v9: - n/a v8: - use bpf_chk_filter, bpf_run_filter. update load_fns - Lots of fixes courtesy of indan@nul.nu: -- fix up load behavior, compat fixups, and merge alloc code, -- renamed pc and dropped __packed, use bool compat. -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch dependencies v7: (massive overhaul thanks to Indan, others) - added CONFIG_HAVE_ARCH_SECCOMP_FILTER - merged into seccomp.c - minimal seccomp_filter.h - no config option (part of seccomp) - no new prctl - doesn't break seccomp on systems without asm/syscall.h (works but arg access always fails) - dropped seccomp_init_task, extra free functions, ... - dropped the no-asm/syscall.h code paths - merges with network sk_run_filter and sk_chk_filter v6: - fix memory leak on attach compat check failure - require no_new_privs || CAP_SYS_ADMIN prior to filter installation. (luto@mit.edu) - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com) - cleaned up Kconfig (amwang@redhat.com) - on block, note if the call was compat (so the # means something) v5: - uses syscall_get_arguments (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) - uses union-based arg storage with hi/lo struct to handle endianness. Compromises between the two alternate proposals to minimize extra arg shuffling and account for endianness assuming userspace uses offsetof(). (mcgrathr@chromium.org, indan@nul.nu) - update Kconfig description - add include/seccomp_filter.h and add its installation - (naive) on-demand syscall argument loading - drop seccomp_t (eparis@redhat.com) v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS - now uses current->no_new_privs (luto@mit.edu,torvalds@linux-foundation.com) - assign names to seccomp modes (rdunlap@xenotime.net) - fix style issues (rdunlap@xenotime.net) - reworded Kconfig entry (rdunlap@xenotime.net) v3: - macros to inline (oleg@redhat.com) - init_task behavior fixed (oleg@redhat.com) - drop creator entry and extra NULL check (oleg@redhat.com) - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) - adds tentative use of "always_unprivileged" as per torvalds@linux-foundation.org and luto@mit.edu v2: - (patch 2 only) Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Replaces the seccomp_t typedef with struct seccomp to match modern kernel style. Signed-off-by: NWill Drewry <wad@chromium.org> Reviewed-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NEric Paris <eparis@redhat.com> v18: rebase ... v14: rebase/nochanges v13: rebase on to 88ebdda6 v12: rebase on to linux-next v8-v11: no changes v7: struct seccomp_struct -> struct seccomp v6: original inclusion in this series. Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Any other users of bpf_*_filter that take a struct sock_fprog from userspace will need to be able to also accept a compat_sock_fprog if the arch supports compat calls. This change allows the existing compat_sock_fprog be shared. Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NEric Paris <eparis@redhat.com> v18: tasered by the apostrophe police v14: rebase/nochanges v13: rebase on to 88ebdda6 v12: rebase on to linux-next v11: introduction Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Will Drewry 提交于
Introduces a new BPF ancillary instruction that all LD calls will be mapped through when skb_run_filter() is being used for seccomp BPF. The rewriting will be done using a secondary chk_filter function that is run after skb_chk_filter. The code change is guarded by CONFIG_SECCOMP_FILTER which is added, along with the seccomp_bpf_load() function later in this series. This is based on http://lkml.org/lkml/2012/3/2/141Suggested-by: NIndan Zupancic <indan@nul.nu> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NEric Paris <eparis@redhat.com> v18: rebase ... v15: include seccomp.h explicitly for when seccomp_bpf_load exists. v14: First cut using a single additional instruction ... v13: made bpf functions generic. Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
由 Andy Lutomirski 提交于
With this change, calling prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) disables privilege granting operations at execve-time. For example, a process will not be able to execute a setuid binary to change their uid or gid if this bit is set. The same is true for file capabilities. Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that LSMs respect the requested behavior. To determine if the NO_NEW_PRIVS bit is set, a task may call prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); It returns 1 if set and 0 if it is not set. If any of the arguments are non-zero, it will return -1 and set errno to -EINVAL. (PR_SET_NO_NEW_PRIVS behaves similarly.) This functionality is desired for the proposed seccomp filter patch series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the system call behavior for itself and its child tasks without being able to impact the behavior of a more privileged task. Another potential use is making certain privileged operations unprivileged. For example, chroot may be considered "safe" if it cannot affect privileged tasks. Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is set and AppArmor is in use. It is fixed in a subsequent patch. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NWill Drewry <wad@chromium.org> Acked-by: NEric Paris <eparis@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> v18: updated change desc v17: using new define values as per 3.4 Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 06 4月, 2012 6 次提交
-
-
由 Ben Hutchings 提交于
Commit e52ac339 ('net: Use device model to get driver name in skb_gso_segment()') removed the only in-tree caller of ethtool ops that doesn't hold the RTNL lock. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Ball 提交于
This reverts commit c16e981b2fd9455af670a69a84f4c8cf07e12658, because it's no longer useful once MSI support is reverted. Signed-off-by: NChris Ball <cjb@laptop.org> -
由 Manuel Lauss 提交于
MSI on my O2Micro OZ600 SD card reader is broken. This patch adds a quirk to disable MSI on these controllers. Signed-off-by: NManuel Lauss <manuel.lauss@googlemail.com> Signed-off-by: NChris Ball <cjb@laptop.org>
-
由 Eric Dumazet 提交于
commit 2f533844 (tcp: allow splice() to build full TSO packets) added a regression for splice() calls using SPLICE_F_MORE. We need to call tcp_flush() at the end of the last page processed in tcp_sendpages(), or else transmits can be deferred and future sends stall. Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but with different semantic. For all sendpage() providers, its a transparent change. Only sock_sendpage() and tcp_sendpages() can differentiate the two different flags provided by pipe_to_sendpage() Reported-by: NTom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NEric Dumazet <eric.dumazet@gmail>com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Hocko 提交于
Although mem_cgroup_uncharge_swap has an empty placeholder for !CONFIG_CGROUP_MEM_RES_CTLR_SWAP the definition is placed in the CONFIG_SWAP ifdef block so we are missing the same definition for !CONFIG_SWAP which implies !CONFIG_CGROUP_MEM_RES_CTLR_SWAP. This has not been an issue before, because mem_cgroup_uncharge_swap was not called from !CONFIG_SWAP context. But Hugh Dickins has a cleanup patch to call __mem_cgroup_commit_charge_swapin which is defined also for !CONFIG_SWAP. Let's move both the empty definition and declaration outside of the CONFIG_SWAP block to avoid the following compilation error: mm/memcontrol.c: In function '__mem_cgroup_commit_charge_swapin': mm/memcontrol.c:2837: error: implicit declaration of function 'mem_cgroup_uncharge_swap' if CONFIG_SWAP is disabled. Reported-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NMichal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Boyd 提交于
debugfs and a few other drivers use an open-coded version of simple_open() to pass a pointer from the file to the read/write file ops. Add support for this simple case to libfs so that we can remove the many duplicate copies of this simple function. Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2012 4 次提交
-
-
由 Eric Dumazet 提交于
Commit f04565dd (dev: use name hash for dev_seq_ops) added a second regression, as some devices are missing from /proc/net/dev if many devices are defined. When seq_file buffer is filled, the last ->next/show() method is canceled (pos value is reverted to value prior ->next() call) Problem is after above commit, we dont restart the lookup at right position in ->start() method. Fix this by removing the internal 'pos' pointer added in commit, since we need to use the 'loff_t *pos' provided by seq_file layer. This also reverts commit 5cac98dd (net: Fix corruption in /proc/*/net/dev_mcast), since its not needed anymore. Reported-by: NBen Greear <greearb@candelatech.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Mihai Maruseac <mmaruseac@ixiacom.com> Tested-by: NBen Greear <greearb@candelatech.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
It just bloats the audit data structure for no good reason, since the only time those fields are filled are just before calling the common_lsm_audit() function, which is also the only user of those fields. So just make them be the arguments to common_lsm_audit(), rather than bloating that structure that is passed around everywhere, and is initialized in hot paths. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Eric Paris 提交于
After shrinking the common_audit_data stack usage for private LSM data I'm not going to shrink the data union. To do this I'm going to move anything larger than 2 void * ptrs to it's own structure and require it to be declared separately on the calling stack. Thus hot paths which don't need more than a couple pointer don't have to declare space to hold large unneeded structures. I could get this down to one void * by dealing with the key struct and the struct path. We'll see if that is helpful after taking care of networking. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
Linus found that the gigantic size of the common audit data caused a big perf hit on something as simple as running stat() in a loop. This patch requires LSMs to declare the LSM specific portion separately rather than doing it in a union. Thus each LSM can be responsible for shrinking their portion and don't have to pay a penalty just because other LSMs have a bigger space requirement. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2012 2 次提交
-
-
由 Paul Gortmaker 提交于
Commit 313162d0 ("device.h: audit and cleanup users in main include dir") exchanged an include <linux/device.h> for a struct *device but in actuality I misread this file when creating 313162d0 and it should have remained an include. There were no build regressions since all consumers were already getting device.h anyway, but make it right regardless. Reported-by: NStefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Gortmaker 提交于
Commit bf4289cb ("ATMEL: fix nand ecc support") indicated that it wanted to "Move platform data to a common header include/linux/platform_data/atmel_nand.h" and the new header even had re-include protectors with: #ifndef __ATMEL_NAND_H__ However, the file that was added was simply called atmel.h and this caused avr32 defconfig to fail with: In file included from arch/avr32/boards/atstk1000/setup.c:22: arch/avr32/mach-at32ap/include/mach/board.h:10:44: error: linux/platform_data/atmel_nand.h: No such file or directory In file included from arch/avr32/boards/atstk1000/setup.c:22: arch/avr32/mach-at32ap/include/mach/board.h:121: warning: 'struct atmel_nand_data' declared inside parameter list arch/avr32/mach-at32ap/include/mach/board.h:121: warning: its scope is only this definition or declaration, which is probably not what you want make[2]: *** [arch/avr32/boards/atstk1000/setup.o] Error 1 It seems the scope of the file contents will expand beyond just nand, so ignore the original intention, and fix up the users who reference the bad name with the _nand suffix. CC: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> CC: David Woodhouse <dwmw2@infradead.org> Acked-by: NHans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 4月, 2012 1 次提交
-
-
由 Pavel Shilovsky 提交于
We can deadlock if we have a write oplock and two processes use the same file handle. In this case the first process can't unlock its lock if the second process blocked on the lock in the same time. Fix it by using posix_lock_file rather than posix_lock_file_wait under cinode->lock_mutex. If we request a blocking lock and posix_lock_file indicates that there is another lock that prevents us, wait untill that lock is released and restart our call. Cc: stable@kernel.org Acked-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NPavel Shilovsky <piastry@etersoft.ru> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 01 4月, 2012 4 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
... implemented that way since the next commit will leave it almost alone in ext2_fs.h - most of the file (including struct ext2_super_block) is going to move to fs/ext2/ext2.h. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Thierry Reding 提交于
Since the on-disk format has been stable for quite some time, users should either use the headers provided by libext2fs or keep a private copy of this header. For the full discussion, see this thread: https://lkml.org/lkml/2012/3/21/516 While at it, this commit removes all __KERNEL__ guards, which are now unnecessary. Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Ted Ts'o <tytso@mit.edu> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Andreas Dilger <aedilger@gmail.com> Cc: linux-ext4@vger.kernel.org
-
- 31 3月, 2012 1 次提交
-
-
由 Amit Shah 提交于
The thaw operation was used by the balloon driver, but after the last commit there's no reason to have separate thaw and restore callbacks. Signed-off-by: NAmit Shah <amit.shah@redhat.com>
-
- 30 3月, 2012 7 次提交
-
-
由 Boris Ostrovsky 提交于
power_usage is always assigned a negative value and should be declared a signed integer Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Boris Ostrovsky 提交于
Currently when a CPU is off-lined it enters either MWAIT-based idle or, if MWAIT is not desired or supported, HLT-based idle (which places the processor in C1 state). This patch allows processors without MWAIT support to stay in states deeper than C1. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Daniel Lezcano 提交于
As far as I can see, this field is never used in the code. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Daniel Lezcano 提交于
All the modules name are ro-data, it is never copied to the array. eg. static struct cpuidle_driver intel_idle_driver = { .name = "intel_idle", .owner = THIS_MODULE, }; It safe to assign the pointer of this ro-data to a const char *. By this way we save 12 bytes. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Acked-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com> Tested-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: NLen Brown <len.brown@intel.com> -
由 ShuoX Liu 提交于
Some C states of new CPU might be not good. One reason is BIOS might configure them incorrectly. To help developers root cause it quickly, the patch adds a new sysfs entry, so developers could disable specific C state manually. In addition, C state might have much impact on performance tuning, as it takes much time to enter/exit C states, which might delay interrupt processing. With the new debug option, developers could check if a deep C state could impact performance and how much impact it could cause. Also add this option in Documentation/cpuidle/sysfs.txt. [akpm@linux-foundation.org: check kstrtol return value] Signed-off-by: NShuoX Liu <shuox.liu@intel.com> Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com> Reviewed-and-Tested-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Jason Wessel 提交于
There has long been a limitation using software breakpoints with a kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For this particular patch, it will apply cleanly and has been tested all the way back to 2.6.36. The kprobes code uses the text_poke() function which accommodates writing a breakpoint into a read-only page. The x86 kgdb code can solve the problem similarly by overriding the default breakpoint set/remove routines and using text_poke() directly. The x86 kgdb code will first attempt to use the traditional probe_kernel_write(), and next try using a the text_poke() function. The break point install method is tracked such that the correct break point removal routine will get called later on. Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org # >= 2.6.36 Inspried-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Cc: stable@vger.kernel.org # >= 2.6.36 Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
- 29 3月, 2012 5 次提交
-
-
由 Steffen Klassert 提交于
The default netlink message size limit might be exceeded when dumping a lot of algorithms to userspace. As a result, not all of the instantiated algorithms dumped to userspace. So calculate an upper bound on the message size and call netlink_dump_start() with that value. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Axel Lin 提交于
Signed-off-by: NAxel Lin <axel.lin@gmail.com> Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
-
由 Rusty Russell 提交于
These are obsolete: cpu_*_mask provides (const) pointers. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> -
由 Konstantin Khlebnikov 提交于
A series of radix tree cleanups, and usage of them in the core pagecache code. Micro-benchmark: lookup 14 slots (typical page-vector size) in radix-tree there earch <step> slot filled and tagged before/after - nsec per full scan through tree * Intel Sandy Bridge i7-2620M 4Mb L3 New code always faster * AMD Athlon 6000+ 2x1Mb L2, without L3 New code generally faster, Minor degradation (marked with "*") for huge sparse trees * i386 on Sandy Bridge New code faster for common cases: tagged and dense trees. Some degradations for non-tagged lookup on sparse trees. Ideally, there might help __ffs() analog for searching first non-zero long element in array, gcc sometimes cannot optimize this loop corretly. Numbers: CPU: Intel Sandy Bridge i7-2620M 4Mb L3 radix-tree with 1024 slots: tagged lookup step 1 before 7156 after 3613 step 2 before 5399 after 2696 step 3 before 4779 after 1928 step 4 before 4456 after 1429 step 5 before 4292 after 1213 step 6 before 4183 after 1052 step 7 before 4157 after 951 step 8 before 4016 after 812 step 9 before 3952 after 851 step 10 before 3937 after 732 step 11 before 4023 after 709 step 12 before 3872 after 657 step 13 before 3892 after 633 step 14 before 3720 after 591 step 15 before 3879 after 578 step 16 before 3561 after 513 normal lookup step 1 before 4266 after 3301 step 2 before 2695 after 2129 step 3 before 2083 after 1712 step 4 before 1801 after 1534 step 5 before 1628 after 1313 step 6 before 1551 after 1263 step 7 before 1475 after 1185 step 8 before 1432 after 1167 step 9 before 1373 after 1092 step 10 before 1339 after 1134 step 11 before 1292 after 1056 step 12 before 1319 after 1030 step 13 before 1276 after 1004 step 14 before 1256 after 987 step 15 before 1228 after 992 step 16 before 1247 after 999 radix-tree with 1024*1024*128 slots: tagged lookup step 1 before 1086102841 after 674196409 step 2 before 816839155 after 498138306 step 7 before 599728907 after 240676762 step 15 before 555729253 after 185219677 step 63 before 606637748 after 128585664 step 64 before 608384432 after 102945089 step 65 before 596987114 after 123996019 step 128 before 304459225 after 56783056 step 256 before 158846855 after 31232481 step 512 before 86085652 after 18950595 step 12345 before 6517189 after 1674057 normal lookup step 1 before 626064869 after 544418266 step 2 before 418809975 after 336321473 step 7 before 242303598 after 207755560 step 15 before 208380563 after 176496355 step 63 before 186854206 after 167283638 step 64 before 176188060 after 170143976 step 65 before 185139608 after 167487116 step 128 before 88181865 after 86913490 step 256 before 45733628 after 45143534 step 512 before 24506038 after 23859036 step 12345 before 2177425 after 2018662 * AMD Athlon 6000+ 2x1Mb L2, without L3 radix-tree with 1024 slots: tag-lookup step 1 before 8164 after 5379 step 2 before 5818 after 5581 step 3 before 4959 after 4213 step 4 before 4371 after 3386 step 5 before 4204 after 2997 step 6 before 4950 after 2744 step 7 before 4598 after 2480 step 8 before 4251 after 2288 step 9 before 4262 after 2243 step 10 before 4175 after 2131 step 11 before 3999 after 2024 step 12 before 3979 after 1994 step 13 before 3842 after 1929 step 14 before 3750 after 1810 step 15 before 3735 after 1810 step 16 before 3532 after 1660 normal-lookup step 1 before 7875 after 5847 step 2 before 4808 after 4071 step 3 before 4073 after 3462 step 4 before 3677 after 3074 step 5 before 4308 after 2978 step 6 before 3911 after 3807 step 7 before 3635 after 3522 step 8 before 3313 after 3202 step 9 before 3280 after 3257 step 10 before 3166 after 3083 step 11 before 3066 after 3026 step 12 before 2985 after 2982 step 13 before 2925 after 2924 step 14 before 2834 after 2808 step 15 before 2805 after 2803 step 16 before 2647 after 2622 radix-tree with 1024*1024*128 slots: tag-lookup step 1 before 1288059720 after 951736580 step 2 before 961292300 after 884212140 step 7 before 768905140 after 547267580 step 15 before 771319480 after 456550640 step 63 before 504847640 after 242704304 step 64 before 392484800 after 177920786 step 65 before 491162160 after 246895264 step 128 before 208084064 after 97348392 step 256 before 112401035 after 51408126 step 512 before 75825834 after 29145070 step 12345 before 5603166 after 2847330 normal-lookup step 1 before 1025677120 after 861375100 step 2 before 647220080 after 572258540 step 7 before 505518960 after 484041813 step 15 before 430483053 after 444815320 * step 63 before 388113453 after 404250546 * step 64 before 374154666 after 396027440 * step 65 before 381423973 after 396704853 * step 128 before 190078700 after 202619384 * step 256 before 100886756 after 102829108 * step 512 before 64074505 after 56158720 step 12345 before 4237289 after 4422299 * * i686 on Sandy bridge radix-tree with 1024 slots: tagged lookup step 1 before 7990 after 4019 step 2 before 5698 after 2897 step 3 before 5013 after 2475 step 4 before 4630 after 1721 step 5 before 4346 after 1759 step 6 before 4299 after 1556 step 7 before 4098 after 1513 step 8 before 4115 after 1222 step 9 before 3983 after 1390 step 10 before 4077 after 1207 step 11 before 3921 after 1231 step 12 before 3894 after 1116 step 13 before 3840 after 1147 step 14 before 3799 after 1090 step 15 before 3797 after 1059 step 16 before 3783 after 745 normal lookup step 1 before 5103 after 3499 step 2 before 3299 after 2550 step 3 before 2489 after 2370 step 4 before 2034 after 2302 * step 5 before 1846 after 2268 * step 6 before 1752 after 2249 * step 7 before 1679 after 2164 * step 8 before 1627 after 2153 * step 9 before 1542 after 2095 * step 10 before 1479 after 2109 * step 11 before 1469 after 2009 * step 12 before 1445 after 2039 * step 13 before 1411 after 2013 * step 14 before 1374 after 2046 * step 15 before 1340 after 1975 * step 16 before 1331 after 2000 * radix-tree with 1024*1024*128 slots: tagged lookup step 1 before 1225865377 after 667153553 step 2 before 842427423 after 471533007 step 7 before 609296153 after 276260116 step 15 before 544232060 after 226859105 step 63 before 519209199 after 141343043 step 64 before 588980279 after 141951339 step 65 before 521099710 after 138282060 step 128 before 298476778 after 83390628 step 256 before 149358342 after 43602609 step 512 before 76994713 after 22911077 step 12345 before 53286669 after 1472111 normal lookup step 1 before 819284564 after 533635310 step 2 before 512421605 after 364956155 step 7 before 271443305 after 305721345 * step 15 before 223591630 after 273960216 * step 63 before 190320247 after 217770207 * step 64 before 178538168 after 267411372 * step 65 before 186400423 after 215347937 * step 128 before 88106045 after 140540612 * step 256 before 44812420 after 70660377 * step 512 before 24435438 after 36328275 * step 12345 before 2123924 after 2148062 * bloat-o-meter delta for this patchset + patchset with related shmem cleanups bloat-o-meter: x86_64 add/remove: 4/3 grow/shrink: 5/6 up/down: 928/-939 (-11) function old new delta radix_tree_next_chunk - 499 +499 shmem_unuse 428 554 +126 shmem_radix_tree_replace 131 227 +96 find_get_pages_tag 354 419 +65 find_get_pages_contig 345 407 +62 find_get_pages 362 396 +34 __kstrtab_radix_tree_next_chunk - 22 +22 __ksymtab_radix_tree_next_chunk - 16 +16 __kcrctab_radix_tree_next_chunk - 8 +8 radix_tree_gang_lookup_slot 204 203 -1 static.shmem_xattr_set 384 381 -3 radix_tree_gang_lookup_tag_slot 208 191 -17 radix_tree_gang_lookup 231 187 -44 radix_tree_gang_lookup_tag 247 199 -48 shmem_unlock_mapping 278 190 -88 __lookup 217 - -217 __lookup_tag 242 - -242 radix_tree_locate_item 279 - -279 bloat-o-meter: i386 add/remove: 3/3 grow/shrink: 8/9 up/down: 1075/-1275 (-200) function old new delta radix_tree_next_chunk - 757 +757 shmem_unuse 352 449 +97 find_get_pages_contig 269 322 +53 shmem_radix_tree_replace 113 154 +41 find_get_pages_tag 277 318 +41 dcache_dir_lseek 426 458 +32 __kstrtab_radix_tree_next_chunk - 22 +22 vc_do_resize 968 977 +9 snd_pcm_lib_read1 725 733 +8 __ksymtab_radix_tree_next_chunk - 8 +8 netlbl_cipsov4_list 1120 1127 +7 find_get_pages 293 291 -2 new_slab 467 459 -8 bitfill_unaligned_rev 425 417 -8 radix_tree_gang_lookup_tag_slot 177 146 -31 blk_dump_cmd 267 229 -38 radix_tree_gang_lookup_slot 212 134 -78 shmem_unlock_mapping 221 128 -93 radix_tree_gang_lookup_tag 275 162 -113 radix_tree_gang_lookup 255 126 -129 __lookup 227 - -227 __lookup_tag 271 - -271 radix_tree_locate_item 277 - -277 This patch: Implement a clean, simple and effective radix-tree iteration routine. Iterating divided into two phases: * lookup next chunk in radix-tree leaf node * iterating through slots in this chunk Main iterator function radix_tree_next_chunk() returns pointer to first slot, and stores in the struct radix_tree_iter index of next-to-last slot. For tagged-iterating it also constuct bitmask of tags for retunted chunk. All additional logic implemented as static-inline functions and macroses. Also adds radix_tree_find_next_bit() static-inline variant of find_next_bit() optimized for small constant size arrays, because find_next_bit() too heavy for searching in an array with one/two long elements. [akpm@linux-foundation.org: rework comments a bit] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Lezcano 提交于
In the case of a child pid namespace, rebooting the system does not really makes sense. When the pid namespace is used in conjunction with the other namespaces in order to create a linux container, the reboot syscall leads to some problems. A container can reboot the host. That can be fixed by dropping the sys_reboot capability but we are unable to correctly to poweroff/ halt/reboot a container and the container stays stuck at the shutdown time with the container's init process waiting indefinitively. After several attempts, no solution from userspace was found to reliabily handle the shutdown from a container. This patch propose to make the init process of the child pid namespace to exit with a signal status set to : SIGINT if the child pid namespace called "halt/poweroff" and SIGHUP if the child pid namespace called "reboot". When the reboot syscall is called and we are not in the initial pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART", and "RESTART2". Otherwise we return EINVAL. Returning EINVAL is also an easy way to check if this feature is supported by the kernel when invoking another 'reboot' option like CAD. By this way the parent process of the child pid namespace knows if it rebooted or not and can take the right decision. Test case: ========== #include <alloca.h> #include <stdio.h> #include <sched.h> #include <unistd.h> #include <signal.h> #include <sys/reboot.h> #include <sys/types.h> #include <sys/wait.h> #include <linux/reboot.h> static int do_reboot(void *arg) { int *cmd = arg; if (reboot(*cmd)) printf("failed to reboot(%d): %m\n", *cmd); } int test_reboot(int cmd, int sig) { long stack_size = 4096; void *stack = alloca(stack_size) + stack_size; int status; pid_t ret; ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd); if (ret < 0) { printf("failed to clone: %m\n"); return -1; } if (wait(&status) < 0) { printf("unexpected wait error: %m\n"); return -1; } if (!WIFSIGNALED(status)) { printf("child process exited but was not signaled\n"); return -1; } if (WTERMSIG(status) != sig) { printf("signal termination is not the one expected\n"); return -1; } return 0; } int main(int argc, char *argv[]) { int status; status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1); if (status >= 0) { printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n"); return 1; } printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n"); return 0; } [akpm@linux-foundation.org: tweak and add comments] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Tested-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-