- 13 11月, 2013 3 次提交
-
-
由 Ryan Mallon 提交于
Some setuid binaries will allow reading of files which have read permission by the real user id. This is problematic with files which use %pK because the file access permission is checked at open() time, but the kptr_restrict setting is checked at read() time. If a setuid binary opens a %pK file as an unprivileged user, and then elevates permissions before reading the file, then kernel pointer values may be leaked. This happens for example with the setuid pppd application on Ubuntu 12.04: $ head -1 /proc/kallsyms 00000000 T startup_32 $ pppd file /proc/kallsyms pppd: In file /proc/kallsyms: unrecognized option 'c1000000' This will only leak the pointer value from the first line, but other setuid binaries may leak more information. Fix this by adding a check that in addition to the current process having CAP_SYSLOG, that effective user and group ids are equal to the real ids. If a setuid binary reads the contents of a file which uses %pK then the pointer values will be printed as NULL if the real user is unprivileged. Update the sysctl documentation to reflect the changes, and also correct the documentation to state the kptr_restrict=0 is the default. This is a only temporary solution to the issue. The correct solution is to do the permission check at open() time on files, and to replace %pK with a function which checks the open() time permission. %pK uses in printk should be removed since no sane permission check can be done, and instead protected by using dmesg_restrict. Signed-off-by: NRyan Mallon <rmallon@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Greg Thelen 提交于
Tests various percpu operations. Enable with CONFIG_PERCPU_TEST=m. Signed-off-by: NGreg Thelen <gthelen@google.com> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
It has been reported on very large machines that show_mem is taking almost 5 minutes to display information. This is a serious problem if there is an OOM storm. The bulk of the cost is in show_mem doing a very expensive PFN walk to give us the following information Total RAM: Also available as totalram_pages Highmem pages: Also available as totalhigh_pages Reserved pages: Can be inferred from the zone structure Shared pages: PFN walk required Unshared pages: PFN walk required Quick pages: Per-cpu walk required Only the shared/unshared pages requires a full PFN walk but that information is useless. It is also inaccurate as page pins of unshared pages would be accounted for as shared. Even if the information was accurate, I'm struggling to think how the shared/unshared information could be useful for debugging OOM conditions. Maybe it was useful before rmap existed when reclaiming shared pages was costly but it is less relevant today. The PFN walk could be optimised a bit but why bother as the information is useless. This patch deletes the PFN walker and infers the total RAM, highmem and reserved pages count from struct zone. It omits the shared/unshared page usage on the grounds that it is useless. It also corrects the reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE has similar problems to HighMem with respect to lowmem/highmem exhaustion. Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 11月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit cb26a311. It mysteriously causes NetworkManager to not find the wireless device for me. As far as I can tell, Tejun *meant* for this commit to not make any semantic changes, but there clearly are some. So revert it, taking into account some of the calling convention changes that happened in this area in subsequent commits. Cc: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 11月, 2013 1 次提交
-
-
由 Ming Lei 提交于
Commit b1adaf65 ("[SCSI] block: add sg buffer copy helper functions") introduces two sg buffer copy helpers, and calls flush_kernel_dcache_page() on pages in SG list after these pages are written to. Unfortunately, the commit may introduce a potential bug: - Before sending some SCSI commands, kmalloc() buffer may be passed to block layper, so flush_kernel_dcache_page() can see a slab page finally - According to cachetlb.txt, flush_kernel_dcache_page() is only called on "a user page", which surely can't be a slab page. - ARCH's implementation of flush_kernel_dcache_page() may use page mapping information to do optimization so page_mapping() will see the slab page, then VM_BUG_ON() is triggered. Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled, and this patch fixes the bug by adding test of '!PageSlab(miter->page)' before calling flush_kernel_dcache_page(). Signed-off-by: NMing Lei <ming.lei@canonical.com> Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi> Tested-by: NSimon Baatz <gmbnomis@gmail.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [3.2+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 10月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
Without the timer debugging, the delayed kobject release will just result in undebuggable oopses if it triggers any latent bugs. That doesn't actually help debugging at all. So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid having people enable one without the other. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2013 2 次提交
-
-
由 Matias Bjorling 提交于
Export the interface to be used within modules. Signed-off-by: NMatias Bjorling <m@bjorling.me> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Hutchings 提交于
Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE) into a Kconfig variable. Original version by Bastian Blank <waldi@debian.org>. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 10月, 2013 1 次提交
-
-
由 Steven Whitehouse 提交于
Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 12 10月, 2013 1 次提交
-
-
由 Fengguang Wu 提交于
Useful for locating buggy drivers on kernel oops. It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is hopefully only enabled in debug kernels (like maybe the Fedora rawhide one, or at developers), so being a bit more verbose is likely ok. Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 11 10月, 2013 1 次提交
-
-
由 Fengguang Wu 提交于
Useful for locating buggy drivers on kernel oops. It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is hopefully only enabled in debug kernels (like maybe the Fedora rawhide one, or at developers), so being a bit more verbose is likely ok. Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 10月, 2013 1 次提交
-
-
由 Tejun Heo 提交于
sysfs currently has a rather weird behavior regarding removals. A directory removal would delete all files directly under it but wouldn't recurse into subdirectories, which, while a bit inconsistent, seems to make sense at the first glance as each directory is supposedly associated with a kobject and each kobject can take care of the directory deletion; however, this doesn't really hold as we have groups which can be directories without a kobject associated with it and require explicit deletions. We're in the process of separating out sysfs from kboject / driver core and want a consistent behavior. A removal should delete either only the specified node or everything under it. I think it is helpful to support recursive atomic removal and later patches will implement it. Such change means that a sysfs_dirent associated with kobject may be deleted before the kobject itself is removed if one of its ancestor gets removed before it. As sysfs_remove_dir() puts the base ref, we may end up with dangling pointer on descendants. This can be solved by holding an extra reference on the sd from kobject. Acquire an extra reference on the associated sysfs_dirent on directory creation and put it after removal. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 9月, 2013 3 次提交
-
-
由 Heiko Carstens 提交于
Make use of arch_mutex_cpu_relax() so architectures can override the default cpu_relax() semantics. This is especially useful for s390, where cpu_relax() means that we yield() the current (virtual) cpu and therefore is very expensive, and would contradict the whole purpose of the lockless cmpxchg loop. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Eric W. Biederman 提交于
In kobj_ns_current_may_mount the default should be to allow the mount. The test is only for a single kobj_ns_type at a time, and unless there is a reason to prevent it the mounting sysfs should be allowed. Subsystems that are not registered can't have are not involved so can't have a reason to prevent mounting sysfs. This is a bug-fix to commit 7dc5dbc8 ("sysfs: Restrict mounting sysfs") that came in via the userns tree during the 3.12 merge window. Reported-and-tested-by: NJames Hogan <james.hogan@imgtec.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
The 64-bit cmpxchg operation on the lockref is ordered by virtue of hazarding between the cmpxchg operation and the reference count manipulation. On weakly ordered memory architectures (such as ARM), it can be of great benefit to omit the barrier instructions where they are not needed. This patch moves the lockless lockref code over to a cmpxchg64_relaxed operation, which doesn't provide barrier semantics. If the operation isn't defined, we simply #define it as the usual 64-bit cmpxchg macro. Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 9月, 2013 4 次提交
-
-
由 Jeff Mahoney 提交于
A common way to handle kobject lifetimes in embedded in objects with different lifetime rules is to pair the kobject with a struct completion. This introduces a kobj_completion structure that can be used in place of the pairing, along with several convenience functions for initialization, release, and put-and-wait. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Tejun Heo 提交于
The way namespace tags are implemented in sysfs is more complicated than necessary. As each tag is a pointer value and required to be non-NULL under a namespace enabled parent, there's no need to record separately what type each tag is or where namespace is enabled. If multiple namespace types are needed, which currently aren't, we can simply compare the tag to a set of allowed tags in the superblock assuming that the tags, being pointers, won't have the same value across multiple types. Also, whether to filter by namespace tag or not can be trivially determined by whether the node has any tagged children or not. This patch rips out kobj_ns_type handling from sysfs. sysfs no longer cares whether specific type of namespace is enabled or not. If a sysfs_dirent has a non-NULL tag, the parent is marked as needing namespace filtering and the value is tested against the allowed set of tags for the superblock (currently only one but increasing this number isn't difficult) and the sysfs_dirent is ignored if it doesn't match. This removes most kobject namespace knowledge from sysfs proper which will enable proper separation and layering of sysfs. The namespace sanity checks in fs/sysfs/dir.c are replaced by the new sanity check in kobject_namespace(). As this is the only place ktype->namespace() is called for sysfs, this doesn't weaken the sanity check significantly. I omitted converting the sanity check in sysfs_do_create_link_sd(). While the check can be shifted to upper layer, mistakes there are well contained and should be easily visible anyway. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Tejun Heo 提交于
For some unrecognizable reason, namespace information is communicated to sysfs through ktype->namespace() callback when there's *nothing* which needs the use of a callback. The whole sequence of operations is completely synchronous and sysfs operations simply end up calling back into the layer which just invoked it in order to find out the namespace information, which is completely backwards, obfuscates what's going on and unnecessarily tangles two separate layers. This patch doesn't remove ktype->namespace() but shifts its handling to kobject layer. We probably want to get rid of the callback in the long term. This patch adds an explicit param to sysfs_{create|rename|move}_dir() and renames them to sysfs_{create|rename|move}_dir_ns(), respectively. ktype->namespace() invocations are moved to the calling sites of the above functions. A new helper kboject_namespace() is introduced which directly tests kobj_ns_type_operations->type which should give the same result as testing sysfs_fs_type(parent_sd) and returns @kobj's namespace tag as necessary. kobject_namespace() is extern as it will be used from another file in the following patches. This patch should be an equivalent conversion without any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Eric W. Biederman 提交于
In kobj_ns_current_may_mount the default should be to allow the mount. The test is only for a single kobj_ns_type at a time, and unless there is a reason to prevent it the mounting sysfs should be allowed. Subsystems that are not registered can't have are not involved so can't have a reason to prevent mounting sysfs. This is a bug-fix to: commit 7dc5dbc8 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Mar 25 20:07:01 2013 -0700 sysfs: Restrict mounting sysfs Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights over the net namespace. The principle here is if you create or have capabilities over it you can mount it, otherwise you get to live with what other people have mounted. Instead of testing this with a straight forward ns_capable call, perform this check the long and torturous way with kobject helpers, this keeps direct knowledge of namespaces out of sysfs, and preserves the existing sysfs abstractions. Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> That came in via the userns tree during the 3.12 merge window. Reported-by: NJames Hogan <james.hogan@imgtec.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 9月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Replace the single preempt_count() 'function' that's an lvalue with two proper functions: preempt_count() - returns the preempt_count value as rvalue preempt_count_set() - Allows setting the preempt-count value Also provide preempt_count_ptr() as a convenience wrapper to implement all modifying operations. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-orxrbycjozopqfhb4dxdkdvb@git.kernel.org [ Fixed build failure. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 9月, 2013 2 次提交
-
-
由 Andre Naujoks 提交于
To be able to use the hex ascii functions in case sensitive environments the array hex_asc_upper[] and the needed functions for hex_byte_pack_upper() are introduced. Signed-off-by: NAndre Naujoks <nautsch2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Will Deacon 提交于
The cmpxchg() function tends not to support 64-bit arguments on 32-bit architectures. This could be either due to use of unsigned long arguments (like on ARM) or lack of instruction support (cmpxchgq on x86). However, these architectures may implement a specific cmpxchg64() function to provide 64-bit cmpxchg support instead. Since the lockref code requires a 64-bit cmpxchg and relies on the architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64 instead of cmpxchg and allow 32-bit architectures to make use of the lockless lockref implementation. Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 9月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 12 9月, 2013 11 次提交
-
-
由 Herbert Xu 提交于
Unfortunately, even with a softdep some distros fail to include the necessary modules in the initrd. Therefore this patch adds a fallback path to restore existing behaviour where we cannot load the new crypto crct10dif algorithm. In order to do this, the underlying crct10dif has been split out from the crypto implementation so that it can be used on the fallback path. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Sergey Senozhatsky 提交于
LZ4 compression and decompression functions require different in signedness input/output parameters: unsigned char for compression and signed char for decompression. Change decompression API to require "(const) unsigned char *". Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Kyungsik Lee <kyungsik.lee@lge.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yann Collet <yann.collet.73@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is one such possible user), the following race can happen: radix_tree_preload() ... radix_tree_insert() radix_tree_node_alloc() if (rtp->nr) { ret = rtp->nodes[rtp->nr - 1]; <interrupt> ... radix_tree_preload() ... radix_tree_insert() radix_tree_node_alloc() if (rtp->nr) { ret = rtp->nodes[rtp->nr - 1]; And we give out one radix tree node twice. That clearly results in radix tree corruption with different results (usually OOPS) depending on which two users of radix tree race. We fix the problem by making radix_tree_node_alloc() always allocate fresh radix tree nodes when in interrupt. Using preloading when in interrupt doesn't make sense since all the allocations have to be atomic anyway and we cannot steal nodes from process-context users because some users rely on radix_tree_insert() succeeding after radix_tree_preload(). in_interrupt() check is somewhat ugly but we cannot simply key off passed gfp_mask as that is acquired from root_gfp_mask() and thus the same for all preload users. Another part of the fix is to avoid node preallocation in radix_tree_preload() when passed gfp_mask doesn't allow waiting. Again, preallocation in such case doesn't make sense and when preallocation would happen in interrupt we could possibly leak some allocated nodes. However, some users of radix_tree_preload() require following radix_tree_insert() to succeed. To avoid unexpected effects for these users, radix_tree_preload() only warns if passed gfp mask doesn't allow waiting and we provide a new function radix_tree_maybe_preload() for those users which get different gfp mask from different call sites and which are prepared to handle radix_tree_insert() failure. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cody P Schafer 提交于
No reason require rbtree test code to be a module, allow it to be builtin (streamlines my development process) Signed-off-by: NCody P Schafer <cody@linux.vnet.ibm.com> Reviewed-by: NSeth Jennings <sjenning@linux.vnet.ibm.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cody P Schafer 提交于
Just check that we examine all nodes in the tree for the postorder iteration. Signed-off-by: NCody P Schafer <cody@linux.vnet.ibm.com> Reviewed-by: NSeth Jennings <sjenning@linux.vnet.ibm.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cody P Schafer 提交于
Postorder iteration yields all of a node's children prior to yielding the node itself, and this particular implementation also avoids examining the leaf links in a node after that node has been yielded. In what I expect will be its most common usage, postorder iteration allows the deletion of every node in an rbtree without modifying the rbtree nodes (no _requirement_ that they be nulled) while avoiding referencing child nodes after they have been "deleted" (most commonly, freed). I have only updated zswap to use this functionality at this point, but numerous bits of code (most notably in the filesystem drivers) use a hand rolled postorder iteration that NULLs child links as it traverses the tree. Each of those instances could be replaced with this common implementation. 1 & 2 add rbtree postorder iteration functions. 3 adds testing of the iteration to the rbtree runtime tests 4 allows building the rbtree runtime tests as builtins 5 updates zswap. This patch: Add postorder iteration functions for rbtree. These are useful for safely freeing an entire rbtree without modifying the tree at all. Signed-off-by: NCody P Schafer <cody@linux.vnet.ibm.com> Reviewed-by: NSeth Jennings <sjenning@linux.vnet.ibm.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Courbot 提交于
When decompressing into memory, the output buffer length is set to some arbitrarily high value (0x7fffffff) to indicate the output is, virtually, unlimited in size. The problem with this is that some platforms have their physical memory at high physical addresses (0x80000000 or more), and that the output buffer address and its "unlimited" length cannot be added without overflowing. An example of this can be found in inflate_fast(): /* next_out is the output buffer address */ out = strm->next_out - OFF; /* avail_out is the output buffer size. end will overflow if the output * address is >= 0x80000104 */ end = out + (strm->avail_out - 257); This has huge consequences on the performance of kernel decompression, since the following exit condition of inflate_fast() will be always true: } while (in < last && out < end); Indeed, "end" has overflowed and is now always lower than "out". As a result, inflate_fast() will return after processing one single byte of input data, and will thus need to be called an unreasonably high number of times. This probably went unnoticed because kernel decompression is fast enough even with this issue. Nonetheless, adjusting the output buffer length in such a way that the above pointer arithmetic never overflows results in a kernel decompression that is about 3 times faster on affected machines. Signed-off-by: NAlexandre Courbot <acourbot@nvidia.com> Tested-by: NJon Medhurst <tixy@linaro.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gu Zheng 提交于
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Emilio López 提交于
The documentation mentions a "name" parameter, which does not exist. This commit removes such mention from the function documentation. Signed-off-by: NEmilio López <emilio@elopez.com.ar> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Use the helper function instead of __GFP_ZERO. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonyoung Shim 提交于
In struct gen_pool_chunk, end_addr means the end address of memory chunk (inclusive), but in the implementation it is treated as address + size of memory chunk (exclusive), so it points to the address plus one instead of correct ending address. The ending address of memory chunk plus one will cause overflow on the memory chunk including the last address of memory map, e.g. when starting address is 0xFFF00000 and size is 0x100000 on 32bit machine, ending address will be 0x100000000. Use correct ending address like starting address + size - 1. [akpm@linux-foundation.org: add comment to struct gen_pool_chunk:end_addr] Signed-off-by: NJoonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 9月, 2013 1 次提交
-
-
由 Kent Overstreet 提交于
Percpu frontend for allocating ids. With percpu allocation (that works), it's impossible to guarantee it will always be possible to allocate all nr_tags - typically, some will be stuck on a remote percpu freelist where the current job can't get to them. We do guarantee that it will always be possible to allocate at least (nr_tags / 2) tags - this is done by keeping track of which and how many cpus have tags on their percpu freelists. On allocation failure if enough cpus have tags that there could potentially be (nr_tags / 2) tags stuck on remote percpu freelists, we then pick a remote cpu at random to steal from. Note that there's no cpu hotplug notifier - we don't care, because steal_tags() will eventually get the down cpu's tags. We _could_ satisfy more allocations if we had a notifier - but we'll still meet our guarantees and it's absolutely not a correctness issue, so I don't think it's worth the extra code. From akpm: "It looks OK to me (that's as close as I get to an ack :)) v6 changes: - Add #include <linux/cpumask.h> to include/linux/percpu_ida.h to make alpha/arc builds happy (Fengguang) - Move second (cpu >= nr_cpu_ids) check inside of first check scope in steal_tags() (akpm + nab) v5 changes: - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm) - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm) - Convert steal_tags() to use cpumask_weight() + cpumask_next() + cpumask_first() + cpumask_clear_cpu() (kmo + akpm) - Add comment for alloc_global_tags() (kmo + akpm) - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm) - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm) - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init() (kmo + akpm) - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy() (kmo + akpm) - Add comment for percpu_ida_alloc @ gfp (kmo + akpm) - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab) v4 changes: - Fix tags.c reference in percpu_ida_init (akpm) Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
- 08 9月, 2013 2 次提交
-
-
由 Linus Torvalds 提交于
The only actual current lockref user (dcache) uses zero reference counts even for perfectly live dentries, because it's a cache: there may not be any users, but that doesn't mean that we want to throw away the dentry. At the same time, the dentry cache does have a notion of a truly "dead" dentry that we must not even increment the reference count of, because we have pruned it and it is not valid. Currently that distinction is not visible in the lockref itself, and the dentry cache validation uses "lockref_get_or_lock()" to either get a new reference to a dentry that already had existing references (and thus cannot be dead), or get the dentry lock so that we can then verify the dentry and increment the reference count under the lock if that verification was successful. That's all somewhat complicated. This adds the concept of being "dead" to the lockref itself, by simply using a count that is negative. This allows a usage scenario where we can increment the refcount of a dentry without having to validate it, and pushing the special "we killed it" case into the lockref code. The dentry code itself doesn't actually use this yet, and it's probably too late in the merge window to do that code (the dentry_kill() code with its "should I decrement the count" logic really is pretty complex code), but let's introduce the concept at the lockref level now. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
The code got rewritten, but the comments got copied as-is from older versions, and as a result the argument name in the comment didn't actually match the code any more. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 9月, 2013 1 次提交
-
-
由 Herbert Xu 提交于
This patch reinstates commits 67822649 39761214 0b95a7f8 31d93962 2d31e518 Now that module softdeps are in the kernel we can use that to resolve the boot issue which cause the revert. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 05 9月, 2013 1 次提交
-
-
由 Vineet Gupta 提交于
Frame pointer on ARC doesn't serve the conventional purpose of stack unwinding due to the typical way ABI designates it's usage. Thus it's explicit usage on ARC is discouraged (gcc is free to use it, for some tricky stack frames even if -fomit-frame-pointer). Hence no point enabling it for ARC. References: http://www.spinics.net/lists/kernel/msg1593937.htmlSigned-off-by: NVineet Gupta <vgupta@synopsys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paul.mckenney@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michel Lespinasse <walken@google.com> Cc: linux-kernel@vger.kernel.org
-
- 04 9月, 2013 1 次提交
-
-
由 Al Viro 提交于
New formats: %p[dD][234]?. The next pointer is interpreted as struct dentry * or struct file * resp. ('d' => dentry, 'D' => file) and the last component(s) of pathname are printed (%pd => just the last one, %pd2 => the last two, etc.) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-