- 13 10月, 2012 5 次提交
-
-
由 Jeff Layton 提交于
Keep a pointer to the audit_names "slot" in struct filename. Have all of the audit_inode callers pass a struct filename ponter to audit_inode instead of a string pointer. If the aname field is already populated, then we can skip walking the list altogether and just use it directly. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
Currently, if we call getname() on a userland string more than once, we'll get multiple copies of the string and multiple audit_names records. Add a function that will allow the audit_names code to satisfy getname requests using info from the audit_names list, avoiding a new allocation and audit_names records. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* allow kernel_execve() leave the actual return to userland to caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers updated accordingly. * architecture that does select GENERIC_KERNEL_EXECVE in its Kconfig should have its ret_from_kernel_thread() do this: call schedule_tail call the callback left for it by copy_thread(); if it ever returns, that's because it has just done successful kernel_execve() jump to return from syscall IOW, its only difference from ret_from_fork() is that it does call the callback. * such an architecture should also get rid of ret_from_kernel_execve() and __ARCH_WANT_KERNEL_EXECVE This is the last part of infrastructure patches in that area - from that point on work on different architectures can live independently. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 10月, 2012 14 次提交
-
-
由 Jason Wessel 提交于
It is possible to miss data when using the kdb pager. The kdb pager does not pay attention to the maximum column constraint of the screen or serial terminal. This result is not incrementing the shown lines correctly and the pager will print more lines that fit on the screen. Obviously that is less than useful when using a VGA console where you cannot scroll back. The pager will now look at the kdb_buffer string to see how many characters are printed. It might not be perfect considering you can output ASCII that might move the cursor position, but it is a substantially better approximation for viewing dmesg and trace logs. This also means that the vt screen needs to set the kdb COLUMNS variable. Cc: <stable@vger.kernel.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
If you press 'q' the pager should exit instead of printing everything from dmesg which can really bog down a 9600 baud serial link. The same is true for the bta command. Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
Allow gdb to auto load kernel modules when it is attached, which makes it trivially easy to debug module init functions or pre-set breakpoints in a kernel module that has not loaded yet. Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
In order to accomodate retrying path-based syscalls, we need to add a new "type" argument to audit_inode_child. This will tell us whether we're looking for a child entry that represents a create or a delete. If we find a parent, don't automatically assume that we need to create a new entry. Instead, use the information we have to try to find an existing entry first. Update it if one is found and create a new one if not. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
In the cases where we already know the length of the parent, pass it as a parm so we don't need to recompute it. In the cases where we don't know the length, pass in AUDIT_NAME_FULL (-1) to indicate that it should be determined. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Paris 提交于
Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
All the callers set this to NULL now. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
Currently, this gets set mostly by happenstance when we call into audit_inode_child. While that might be a little more efficient, it seems wrong. If the syscall ends up failing before audit_inode_child ever gets called, then you'll have an audit_names record that shows the full path but has the parent inode info attached. Fix this by passing in a parent flag when we call audit_inode that gets set to the value of LOOKUP_PARENT. We can then fix up the pathname for the audit entry correctly from the get-go. While we're at it, clean up the no-op macro for audit_inode in the !CONFIG_AUDITSYSCALL case. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
For now, we just have two possibilities: UNKNOWN: for a new audit_names record that we don't know anything about yet NORMAL: for everything else In later patches, we'll add other types so we can distinguish and update records created under different circumstances. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
Most of the callers get called with an inode and dentry in the reverse order. The compiler then has to reshuffle the arg registers and/or stack in order to pass them on to audit_inode_child. Reverse those arguments for a micro-optimization. Reported-by: NEric Paris <eparis@redhat.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
If name is NULL then the condition in the loop will never be true. Also, with this change, we can eliminate the check for n->name == NULL since the equivalence check will never be true if it is. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
In some cases, we were passing in NULL even when we have a dentry. Reported-by: NEric Paris <eparis@redhat.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Most of them never returned anyway - only two functions had to be changed. That allows to simplify their callers a whole lot. Note that this does *not* apply to kthread_run() callbacks - all of those had been called from the same kernel_thread() callback, which did do_exit() already. This is strictly about very few low-level kernel_thread() callbacks (there are only 6 of those, mostly as part of kthread.h and kmod.h exported mechanisms, plus kernel_init() itself). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 10月, 2012 12 次提交
-
-
由 Rusty Russell 提交于
It doesn't, because the clean targets don't include kernel/Makefile, and because two files were missing from the list. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 David Howells 提交于
Place an indication that the certificate should use utf8 strings into the x509.genkey template generated by kernel/Makefile. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 David Howells 提交于
Use the same digest type for the autogenerated key signature as for the module signature so that the hash algorithm is guaranteed to be present in the kernel. Without this, the X.509 certificate loader may reject the X.509 certificate so generated because it was self-signed and the signature will be checked against itself - but this won't work if the digest algorithm must be loaded as a module. The symptom is that the key fails to load with the following message emitted into the kernel log: MODSIGN: Problem loading in-kernel X.509 certificate (-65) the error in brackets being -ENOPKG. What you should see is something like: MODSIGN: Loaded cert 'Magarathea: Glacier signing key: 9588321144239a119d3406d4c4cf1fbae1836fa0' Note that this doesn't apply to certificates that are not self-signed as we don't check those currently as they require the parent CA certificate to be available. Reported-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 David Howells 提交于
Check the signature on the module against the keys compiled into the kernel or available in a hardware key store. Currently, only RSA keys are supported - though that's easy enough to change, and the signature is expected to contain raw components (so not a PGP or PKCS#7 formatted blob). The signature blob is expected to consist of the following pieces in order: (1) The binary identifier for the key. This is expected to match the SubjectKeyIdentifier from an X.509 certificate. Only X.509 type identifiers are currently supported. (2) The signature data, consisting of a series of MPIs in which each is in the format of a 2-byte BE word sizes followed by the content data. (3) A 12 byte information block of the form: struct module_signature { enum pkey_algo algo : 8; enum pkey_hash_algo hash : 8; enum pkey_id_type id_type : 8; u8 __pad; __be32 id_length; __be32 sig_length; }; The three enums are defined in crypto/public_key.h. 'algo' contains the public-key algorithm identifier (0->DSA, 1->RSA). 'hash' contains the digest algorithm identifier (0->MD4, 1->MD5, 2->SHA1, etc.). 'id_type' contains the public-key identifier type (0->PGP, 1->X.509). '__pad' should be 0. 'id_length' should contain in the binary identifier length in BE form. 'sig_length' should contain in the signature data length in BE form. The lengths are in BE order rather than CPU order to make dealing with cross-compilation easier. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor Kconfig fix)
-
由 David Howells 提交于
Include a PGP keyring containing the public keys required to perform module verification in the kernel image during build and create a special keyring during boot which is then populated with keys of crypto type holding the public keys found in the PGP keyring. These can be seen by root: [root@andromeda ~]# cat /proc/keys 07ad4ee0 I----- 1 perm 3f010000 0 0 crypto modsign.0: RSA 87b9b3bd [] 15c7f8c3 I----- 1 perm 1f030000 0 0 keyring .module_sign: 1/4 ... It is probably worth permitting root to invalidate these keys, resulting in their removal and preventing further modules from being loaded with that key. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 David Howells 提交于
Automatically generate keys for module signing if they're absent so that allyesconfig doesn't break. The builder should consider generating their own key and certificate, however, so that the keys are appropriately named. The private key for the module signer should be placed in signing_key.priv (unencrypted!) and the public key in an X.509 certificate as signing_key.x509. If a transient key is desired for signing the modules, a config file for 'openssl req' can be placed in x509.genkey, looking something like the following: [ req ] default_bits = 4096 distinguished_name = req_distinguished_name prompt = no x509_extensions = myexts [ req_distinguished_name ] O = Magarathea CN = Glacier signing key emailAddress = slartibartfast@magrathea.h2g2 [ myexts ] basicConstraints=critical,CA:FALSE keyUsage=digitalSignature subjectKeyIdentifier=hash authorityKeyIdentifier=hash The build process will use this to configure: openssl req -new -nodes -utf8 -sha1 -days 36500 -batch \ -x509 -config x509.genkey \ -outform DER -out signing_key.x509 \ -keyout signing_key.priv to generate the key. Note that it is required that the X.509 certificate have a subjectKeyIdentifier and an authorityKeyIdentifier. Without those, the certificate will be rejected. These can be used to check the validity of a certificate. Note that 'make distclean' will remove signing_key.{priv,x509} and x509.genkey, whether or not they were generated automatically. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 David Howells 提交于
If we're in FIPS mode, we should panic if we fail to verify the signature on a module or we're asked to load an unsigned module in signature enforcing mode. Possibly FIPS mode should automatically enable enforcing mode. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
We do a very simple search for a particular string appended to the module (which is cache-hot and about to be SHA'd anyway). There's both a config option and a boot parameter which control whether we accept or fail with unsigned modules and modules that are signed with an unknown key. If module signing is enabled, the kernel will be tainted if a module is loaded that is unsigned or has a signature for which we don't have the key. (Useful feedback and tweaks by David Howells <dhowells@redhat.com>) Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Linus Walleij 提交于
Currently we rely on all IRQ chip instances to dynamically allocate their IRQ descriptors unless they use the linear IRQ domain. So for irqdomain_add_legacy() and irqdomain_add_simple() the caller need to make sure that descriptors are allocated. Let's slightly augment the yet unused irqdomain_add_simple() to also allocate descriptors as a means to simplify usage and avoid code duplication throughout the kernel. We warn if descriptors cannot be allocated, e.g. if a platform has the bad habit of hogging descriptors at boot time. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Lee Jones <lee.jones@linaro.org> Reviewed-by: NRob Herring <rob.herring@calxeda.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
-
由 Sasha Levin 提交于
audit_log_start() may return NULL, this is unchecked by the caller in audit_log_link_denied() and could cause a NULL ptr deref. Introduced by commit a51d9eaa ("fs: add link restriction audit reporting"). Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dan Carpenter 提交于
We fixed a bunch of integer overflows in timekeeping code during the 3.6 cycle. I did an audit based on that and found this potential overflow. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NJohn Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20121009071823.GA19159@elgon.mountainSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
由 Hildner, Christian 提交于
Adding two (or more) timers with large values for "expires" (they have to reside within tv5 in the same list) leads to endless looping between cascade() and internal_add_timer() in case CONFIG_BASE_SMALL is one and jiffies are crossing the value 1 << 18. The bug was introduced between 2.6.11 and 2.6.12 (and survived for quite some time). This patch ensures that when cascade() is called timers within tv5 are not added endlessly to their own list again, instead they are added to the next lower tv level tv4 (as expected). Signed-off-by: NChristian Hildner <christian.hildner@siemens.com> Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com> Link: http://lkml.kernel.org/r/98673C87CB31274881CFFE0B65ECC87B0F5FC1963E@DEFTHW99EA4MSX.ww902.siemens.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
- 09 10月, 2012 9 次提交
-
-
由 Haggai Eran 提交于
In order to allow sleeping during invalidate_page mmu notifier calls, we need to avoid calling when holding the PT lock. In addition to its direct calls, invalidate_page can also be called as a substitute for a change_pte call, in case the notifier client hasn't implemented change_pte. This patch drops the invalidate_page call from change_pte, and instead wraps all calls to change_pte with invalidate_range_start and invalidate_range_end calls. Note that change_pte still cannot sleep after this patch, and that clients implementing change_pte should not take action on it in case the number of outstanding invalidate_range_start calls is larger than one, otherwise they might miss a later invalidation. Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Cc: Andrea Arcangeli <andrea@qumranet.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Liran Liss <liranl@mellanox.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Update the generic interval tree code that was introduced in "mm: replace vma prio_tree with an interval tree". Changes: - fixed 'endpoing' typo noticed by Andrew Morton - replaced include/linux/interval_tree_tmpl.h, which was used as a template (including it automatically defined the interval tree functions) with include/linux/interval_tree_generic.h, which only defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself defines the interval tree functions when invoked. Now that is a very long macro which is unfortunate, but it does make the usage sites (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously. - make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro, instead of duplicating that code in the interval tree template. - replaced vma_interval_tree_add(), which was actually handling the nonlinear and interval tree cases, with vma_interval_tree_insert_after() which handles only the interval tree case and has an API that is more consistent with the other interval tree handling functions. The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap(). Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Implement an interval tree as a replacement for the VMA prio_tree. The algorithms are similar to lib/interval_tree.c; however that code can't be directly reused as the interval endpoints are not explicitly stored in the VMA. So instead, the common algorithm is moved into a template and the details (node type, how to get interval endpoints from the node, etc) are filled in using the C preprocessor. Once the interval tree functions are available, using them as a replacement to the VMA prio tree is a relatively simple, mechanical job. Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month. Signed-off-by: NDavidlohr Bueso <dave@gnu.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Currently the kernel sets mm->exe_file during sys_execve() and then tracks number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it resets mm->exe_file at last mmput() when mm->mm_users drops to zero. VMA with VM_EXECUTABLE flag appears after mapping file with flag MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma splitting, because sys_mmap ignores this flag. Usually binfmt module sets mm->exe_file and mmaps executable vmas with this file, they hold mm->exe_file while task is running. comment from v2.6.25-6245-g925d1c40 ("procfs task exe symlink"), where all this stuff was introduced: > The kernel implements readlink of /proc/pid/exe by getting the file from > the first executable VMA. Then the path to the file is reconstructed and > reported as the result. > > Because of the VMA walk the code is slightly different on nommu systems. > This patch avoids separate /proc/pid/exe code on nommu systems. Instead of > walking the VMAs to find the first executable file-backed VMA we store a > reference to the exec'd file in the mm_struct. > > That reference would prevent the filesystem holding the executable file > from being unmounted even after unmapping the VMAs. So we track the number > of VM_EXECUTABLE VMAs and drop the new reference when the last one is > unmapped. This avoids pinning the mounted filesystem. exe_file's vma accounting is hooked into every file mmap/unmmap and vma split/merge just to fix some hypothetical pinning fs from umounting by mm, which already unmapped all its executable files, but still alive. Seems like currently nobody depends on this behaviour. We can try to remove this logic and keep mm->exe_file until final mmput(). mm->exe_file is still protected with mm->mmap_sem, because we want to change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall task can change its mm->exe_file and unpin mountpoint explicitly. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Some security modules and oprofile still uses VM_EXECUTABLE for retrieving a task's executable file. After this patch they will use mm->exe_file directly. mm->exe_file is protected with mm->mmap_sem, so locking stays the same. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [arch/tile] Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [tomoyo] Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: NJames Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Srivatsa S. Bhat 提交于
The synchronization between CPU hotplug readers and writers is achieved by means of refcounting, safeguarded by the cpu_hotplug.lock. get_online_cpus() increments the refcount, whereas put_online_cpus() decrements it. If we ever hit an imbalance between the two, we end up compromising the guarantees of the hotplug synchronization i.e, for example, an extra call to put_online_cpus() can end up allowing a hotplug reader to execute concurrently with a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases where the refcount can go negative, and also attempt to fix it up, so that we can continue to run. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Catalin Marinas 提交于
Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the architectures requiring support for the "exception-trace" debug_table entry in kernel/sysctl.c. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-