- 24 9月, 2013 8 次提交
-
-
由 Kirill Tkhai 提交于
Some architectures have sparse cpu mask. UltraSparc's cpuinfo for example: CPU0: online CPU2: online So, set only possible CPUs when CONFIG_RCU_NOCB_CPU_ALL is enabled. Also, check that user passes right 'rcu_nocbs=' option. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> CC: Dipankar Sarma <dipankar@in.ibm.com> [ paulmck: Fix pr_info() issue noted by scripts/checkpatch.pl. ] Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit extends the work done in f7f7bac9 (rcu: Have the RCU tracepoints use the tracepoint_string infrastructure) to cover rcutiny. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
-
由 Paul E. McKenney 提交于
If a system is idle from an RCU perspective for longer than specified by CONFIG_RCU_CPU_STALL_TIMEOUT, and if one CPU starts a grace period just as a second checks for CPU stalls, and if this second CPU happens to see the old value of rsp->jiffies_stall, it will incorrectly report a CPU stall. This is quite rare, but apparently occurs deterministically on systems with about 6TB of memory. This commit therefore orders accesses to the data used to determine whether or not a CPU stall is in progress. Grace-period initialization and cleanup first increments rsp->completed to mark the end of the previous grace period, then records the current jiffies in rsp->gp_start, then records the jiffies at which a stall can be expected to occur in rsp->jiffies_stall, and finally increments rsp->gpnum to mark the start of the new grace period. Now, this ordering by itself does not prevent false positives. For example, if grace-period initialization was delayed between recording rsp->gp_start and rsp->jiffies_stall, the CPU stall warning code might still see an old value of rsp->jiffies_stall. Therefore, this commit also orders the CPU stall warning accesses as well, loading rsp->gpnum and jiffies, then rsp->jiffies_stall, then rsp->gp_start, and finally rsp->completed. This ordering means that the false-positive scenario in the previous paragraph would result in rsp->completed being greater than or equal to rsp->gpnum, which is never valid for a CPU stall, allowing the false positive to be rejected. Furthermore, any fetch that gets an old value of rsp->jiffies_stall must also get an old value of rsp->gpnum, which will again be rejected by the comparison of rsp->gpnum and rsp->completed. Situations where rsp->gp_start is later than rsp->jiffies_stall are also rejected, as are situations where jiffies is less than rsp->jiffies_stall. Although use of unsynchronized accesses means that there are likely still some false-positive scenarios (synchronization has proven to be a very bad idea on large systems), this should get rid of a large class of these scenarios. Reported-by: NFabian Herschel <fabian.herschel@suse.com> Reported-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Tested-by: NJochen Striepe <jochen@tolot.escape.de>
-
由 Paul E. McKenney 提交于
The for_each_rcu_flavor() loop unconditionally scans all flavors, even when the first flavor might have some non-lazy callbacks. Once the loop has seen a non-lazy callback, further passes through the loop cannot change the state. This is not a huge problem, given that there can be at most three RCU flavors (RCU-bh, RCU-preempt, and RCU-sched), but this code is on the path to idle, so speeding it up even a small amount would have some benefit. This commit therefore does two things: 1. Rearranges the order of the list of RCU flavors in order to place the most active flavor first in the list. The most active RCU flavor is RCU-preempt, or, if there is no RCU-preempt, RCU-sched. 2. Reworks the for_each_rcu_flavor() to exit early when the first non-lazy callback is seen, or, in the case where the caller does not care about non-lazy callbacks (RCU_FAST_NO_HZ=n), when the first callback is seen. Reported-by: NChen Gang <gang.chen@asianux.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The "idle" variable in both rcu_eqs_enter_common() and rcu_eqs_exit_common() is only used in a WARN_ON_ONCE(). If the kernel is built disabling WARN_ON_ONCE(), the compiler will complain (rightly) that "idle" is unused. This commit therefore adds a __maybe_unused to the declaration of both variables. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Christoph Lameter 提交于
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calcualtions are avoided and less registers are used when code is generated. At the end of the patchset all uses of __get_cpu_var have been removed so the macro is removed too. The patchset includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, u); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(this_cpu_ptr(&x), y, sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to this_cpu_inc(y) Signed-off-by: NChristoph Lameter <cl@linux.com> [ paulmck: Address conflicts. ] Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit replaces an incorrect (but fortunately functional) bitwise OR ("|") operator with the correct logical OR ("||"). Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The rcu_cpu_stall_timeout kernel parameter, the rcu_dynticks per-CPU variable, and the rcu_gp_fqs() function are used only locally. This commit therefore marks them as static. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 21 9月, 2013 1 次提交
-
-
由 Paul E. McKenney 提交于
One of the ->gp_flags assignments used a raw number rather than the cpp macro that was intended for this purpose, which this commit fixes. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 13 9月, 2013 5 次提交
-
-
由 Martin Schwidefsky 提交于
After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Jingoo Han 提交于
The usage of strict_strto*() is not preferred, because strict_strto*() is obsolete. Thus, kstrto*() should be used. Signed-off-by: NJingoo Han <jg1.han@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sha Zhengju 提交于
This function dereferences res far too often, so optimize it. Signed-off-by: NSha Zhengju <handai.szj@taobao.com> Signed-off-by: NQiang Huang <h.huangqiang@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sha Zhengju 提交于
Since PAGE_ALIGN is aligning up(the next page boundary), so after PAGE_ALIGN, the value might be overflow, such as write the MAX value to *.limit_in_bytes. $ cat /cgroup/memory/memory.limit_in_bytes 18446744073709551615 # echo 18446744073709551615 > /cgroup/memory/memory.limit_in_bytes bash: echo: write error: Invalid argument Some user programs might depend on such behaviours(like libcg, we read the value in snapshot, then use the value to reset cgroup later), and that will cause confusion. So we need to fix it. Signed-off-by: NSha Zhengju <handai.szj@taobao.com> Signed-off-by: NQiang Huang <h.huangqiang@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sha Zhengju 提交于
RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX. Signed-off-by: NSha Zhengju <handai.szj@taobao.com> Signed-off-by: NQiang Huang <h.huangqiang@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 9月, 2013 23 次提交
-
-
由 Oleg Nesterov 提交于
Currently utask->depth is simply the number of allocated/pending return_instance's in uprobe_task->return_instances list. handle_trampoline() should decrement this counter every time we handle/free an instance, but due to typo it does this only if ->chained == T. This means that in the likely case this counter is never decremented and the probed task can't report more than MAX_URETPROBE_DEPTH events. Reported-by: NMikhail Kulemin <Mikhail.Kulemin@ru.ibm.com> Reported-by: NHemant Kumar Shaw <hkshaw@linux.vnet.ibm.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NAnton Arapov <anton@redhat.com> Cc: masami.hiramatsu.pt@hitachi.com Cc: srikar@linux.vnet.ibm.com Cc: systemtap@sourceware.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20130911154726.GA8093@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kees Cook 提交于
Since the panic handlers may produce additional information (via printk) for the kernel log, it should be reported as part of the panic output saved by kmsg_dump(). Without this re-ordering, nothing that adds information to a panic will show up in pstore's view when kmsg_dump runs, and is therefore not visible to crash reporting tools that examine pstore output. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Acked-by: NTony Luck <tony.luck@intel.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Vikram Mulukutla <markivx@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xishi Qiu 提交于
Code can not run here forever, so remove the unnecessary return. Signed-off-by: NXishi Qiu <qiuxishi@huawei.com> Suggested-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: NSimon Horman <horms@verge.net.au> Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mark Grondona 提交于
__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task != current, this can can lead to surprising results. For example, a sub-thread can't readlink("/proc/self/exe") if the executable is not readable. setup_new_exec()->would_dump() notices that inode_permission(MAY_READ) fails and then it does set_dumpable(suid_dumpable). After that get_dumpable() fails. (It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we could add PTRACE_MODE_NODUMPABLE) Change __ptrace_may_access() to use same_thread_group() instead of "task == current". Any security check is pointless when the tasks share the same ->mm. Signed-off-by: NMark Grondona <mgrondona@llnl.gov> Signed-off-by: NBen Woodard <woodard@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
The current two insn slot caches both use module_alloc/module_free to allocate and free insn slot cache pages. For s390 this is not sufficient since there is the need to allocate insn slots that are either within the vmalloc module area or within dma memory. Therefore add a mechanism which allows to specify an own allocator for an own insn slot cache. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
The current kpropes insn caches allocate memory areas for insn slots with module_alloc(). The assumption is that the kernel image and module area are both within the same +/- 2GB memory area. This however is not true for s390 where the kernel image resides within the first 2GB (DMA memory area), but the module area is far away in the vmalloc area, usually somewhere close below the 4TB area. For new pc relative instructions s390 needs insn slots that are within +/- 2GB of each area. That way we can patch displacements of pc-relative instructions within the insn slots just like x86 and powerpc. The module area works already with the normal insn slot allocator, however there is currently no way to get insn slots that are within the first 2GB on s390 (aka DMA area). Therefore this patch set modifies the kprobes insn slot cache code in order to allow to specify a custom allocator for the insn slot cache pages. In addition architecure can now have private insn slot caches withhout the need to modify common code. Patch 1 unifies and simplifies the current insn and optinsn caches implementation. This is a preparation which allows to add more insn caches in a simple way. Patch 2 adds the possibility to specify a custom allocator. Patch 3 makes s390 use the new insn slot mechanisms and adds support for pc-relative instructions with long displacements. This patch (of 3): The two insn caches (insn, and optinsn) each have an own mutex and alloc/free functions (get_[opt]insn_slot() / free_[opt]insn_slot()). Since there is the need for yet another insn cache which satifies dma allocations on s390, unify and simplify the current implementation: - Move the per insn cache mutex into struct kprobe_insn_cache. - Move the alloc/free functions to kprobe.h so they are simply wrappers for the generic __get_insn_slot/__free_insn_slot functions. The implementation is done with a DEFINE_INSN_CACHE_OPS() macro which provides the alloc/free functions for each cache if needed. - move the struct kprobe_insn_cache to kprobe.h which allows to generate architecture specific insn slot caches outside of the core kprobes code. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
No functional changes, just comments. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Trivial. Remove the unnecessary "work = NULL" initialization and turn read_barrier_depends() into smp_read_barrier_depends() in task_work_cancel(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Daney 提交于
As in commit f21afc25 ("smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu()"), we don't want to enable irqs if they are not already enabled. I don't know of any bugs currently caused by this unconditional local_irq_enable(), but I want to use this function in MIPS/OCTEON early boot (when we have early_boot_irqs_disabled). This also makes this function have similar semantics to on_each_cpu() which is good in itself. Signed-off-by: NDavid Daney <david.daney@cavium.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uwe Kleine-König 提交于
At least on ARM no-MMU the extable is empty and so there is nothing to sort. So add a check for the table to be empty which effectively only changes that the misleading pr_notice is suppressed. Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Daney <david.daney@cavium.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Daney 提交于
All of the other non-trivial !SMP versions of functions in smp.h are out-of-line in up.c. Move on_each_cpu() there as well. This allows us to get rid of the #include <linux/irqflags.h>. The drawback is that this makes both the x86_64 and i386 defconfig !SMP kernels about 200 bytes larger each. Signed-off-by: NDavid Daney <david.daney@cavium.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Daney 提交于
The SMP version of this function doesn't unconditionally enable irqs, so neither should this !SMP version. There are no know problems caused by this, but we make the change for consistency's sake. Signed-off-by: NDavid Daney <david.daney@cavium.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Daney 提交于
As in commit f21afc25 ("smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu()"), we don't want to enable irqs if they are not already enabled. There are currently no known problematical callers of these functions, but since it is a known failure pattern, we preemptively fix them. Since they are not trivial functions, make them non-inline by moving them to up.c. This also makes it so we don't have to fix #include dependancies for preempt_{disable,enable}. Signed-off-by: NDavid Daney <david.daney@cavium.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
When running with GENERIC_LOCKBREAK=y, the locking implementations emit calls to arch_{read,write,spin}_relax when spinning on a contended lock in order to allow architectures to favour the CPU owning the lock if possible. In reality, everybody apart from PowerPC and S390 just does cpu_relax() here, so make that the default behaviour and allow it to be overridden if required. Signed-off-by: NWill Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chen Gang 提交于
When failure occurs in hotplug_cfd(), need release related resources, or will cause memory leak. Signed-off-by: NChen Gang <gang.chen@asianux.com> Acked-by: NWang YanQing <udknight@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
const has to use __initconst, not __initdata Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mathieu Desnoyers 提交于
I found the following pattern that leads in to interesting findings: grep -r "ret.*|=.*__put_user" * grep -r "ret.*|=.*__get_user" * grep -r "ret.*|=.*__copy" * The __put_user() calls in compat_ioctl.c, ptrace compat, signal compat, since those appear in compat code, we could probably expect the kernel addresses not to be reachable in the lower 32-bit range, so I think they might not be exploitable. For the "__get_user" cases, I don't think those are exploitable: the worse that can happen is that the kernel will copy kernel memory into in-kernel buffers, and will fail immediately afterward. The alpha csum_partial_copy_from_user() seems to be missing the access_ok() check entirely. The fix is inspired from x86. This could lead to information leak on alpha. I also noticed that many architectures map csum_partial_copy_from_user() to csum_partial_copy_generic(), but I wonder if the latter is performing the access checks on every architectures. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Now hugepage migration is enabled, although restricted on pmd-based hugepages for now (due to lack of testing.) So we should allocate migratable hugepages from ZONE_MOVABLE if possible. This patch makes GFP flags in hugepage allocation dependent on migration support, not only the value of hugepages_treat_as_movable. It provides no change on the behavior for architectures which do not support hugepage migration, Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NAndi Kleen <ak@linux.intel.com> Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xishi Qiu 提交于
Use "zone_end_pfn()" instead of "zone->zone_start_pfn + zone->spanned_pages". Simplify the code, no functional change. [akpm@linux-foundation.org: fix build] Signed-off-by: NXishi Qiu <qiuxishi@huawei.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Simple cleanup. Every user of vma_set_policy() does the same work, this looks a bit annoying imho. And the new trivial helper which does mpol_dup() + vma_set_policy() to simplify the callers. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
do_fork() denies CLONE_THREAD | CLONE_PARENT if NEWUSER | NEWPID. Then later copy_process() denies CLONE_SIGHAND if the new process will be in a different pid namespace (task_active_pid_ns() doesn't match current->nsproxy->pid_ns). This looks confusing and inconsistent. CLONE_NEWPID is very similar to the case when ->pid_ns was already unshared, we want the same restrictions so copy_process() should also nack CLONE_PARENT. And it would be better to deny CLONE_NEWUSER && CLONE_SIGHAND as well just for consistency. Kill the "CLONE_NEWUSER | CLONE_NEWPID" check in do_fork() and change copy_process() to do the same check along with ->pid_ns check we already have. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Colin Walters <walters@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Commit 8382fcac ("pidns: Outlaw thread creation after unshare(CLONE_NEWPID)") nacks CLONE_NEWPID if the forking process unshared pid_ns. This is correct but unnecessary, copy_pid_ns() does the same check. Remove the CLONE_NEWPID check to cleanup the code and prepare for the next change. Test-case: static int child(void *arg) { return 0; } static char stack[16 * 1024]; int main(void) { pid_t pid; assert(unshare(CLONE_NEWUSER | CLONE_NEWPID) == 0); pid = clone(child, stack + sizeof(stack) / 2, CLONE_NEWPID | SIGCHLD, NULL); assert(pid < 0 && errno == EINVAL); return 0; } clone(CLONE_NEWPID) correctly fails with or without this change. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Colin Walters <walters@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Commit 8382fcac ("pidns: Outlaw thread creation after unshare(CLONE_NEWPID)") nacks CLONE_VM if the forking process unshared pid_ns, this obviously breaks vfork: int main(void) { assert(unshare(CLONE_NEWUSER | CLONE_NEWPID) == 0); assert(vfork() >= 0); _exit(0); return 0; } fails without this patch. Change this check to use CLONE_SIGHAND instead. This also forbids CLONE_THREAD automatically, and this is what the comment implies. We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it would be safer to not do this. The current check denies CLONE_SIGHAND implicitely and there is no reason to change this. Eric said "CLONE_SIGHAND is fine. CLONE_THREAD would be even better. Having shared signal handling between two different pid namespaces is the case that we are fundamentally guarding against." Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NColin Walters <walters@redhat.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 9月, 2013 2 次提交
-
-
由 Arnaldo Carvalho de Melo 提交于
The ino_generation field was added in the PERF_RECORD_MMAP2 record in the 13d7a241 cset but no space for it was allocated, corrupting the PERF_FORMAT_{TIME,CPU,TID,etc} area (sample_type/sample_id_all), fix it. Detected with one of the regression tests done by 'perf test': [root@sandy ~]# perf test -v 7 7: Validate PERF_RECORD_* events & perf_sample fields : --- start --- 61315294449606 0 PERF_RECORD_SAMPLE 61315294453161 0 PERF_RECORD_SAMPLE 61315294454441 0 PERF_RECORD_SAMPLE 61315294455709 0 PERF_RECORD_SAMPLE 61315295600899 0 PERF_RECORD_COMM: sleep:6500 27917287430500 342521613 PERF_RECORD_MMAP2 6500/6500: [0x400000(0x7000) @ 0 00:1d 311442 9016]: /usr/bin/sleep MMAP2 going backwards in time, prev=61315295600899, curr=27917287430500 MMAP2 with unexpected cpu, expected 0, got 342521613 MMAP2 with unexpected pid, expected 6500, got 1701606191 MMAP2 with unexpected tid, expected 6500, got 28773 27917287430500 342561333 PERF_RECORD_MMAP2 6500/6500: [0x3b7e000000(0x223000) @ 0 00:1d 309186 9016]: /usr/lib64/ld-2.16.so MMAP2 with unexpected cpu, expected 0, got 342561333 MMAP2 with unexpected pid, expected 6500, got 1932408369 MMAP2 with unexpected tid, expected 6500, got 111 27917287430500 342600095 PERF_RECORD_MMAP2 6500/6500: [0x7fffbd7dc000(0x1000) @ 0x7fffbd7dc000 00:00 0 0]: [vdso] MMAP2 with unexpected cpu, expected 0, got 342600095 MMAP2 with unexpected pid, expected 6500, got 1935963739 MMAP2 with unexpected tid, expected 6500, got 23919 27917287430500 342882834 PERF_RECORD_MMAP2 6500/6500: [0x3b7e400000(0x3b8000) @ 0 00:1d 309187 9016]: /usr/lib64/libc-2.16.so MMAP2 with unexpected cpu, expected 0, got 342882834 MMAP2 with unexpected pid, expected 6500, got 909192754 MMAP2 with unexpected tid, expected 6500, got 7303982 61316297195411 0 PERF_RECORD_EXIT(6500:6500):(6500:6500) ---- end ---- Validate PERF_RECORD_* events & perf_sample fields: FAILED! [root@sandy ~]# After this patch: [root@sandy ~]# perf test 7 7: Validate PERF_RECORD_* events & perf_sample fields : Ok [root@sandy ~]# Acked-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NStephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-heeuv986b8ha7whqg4o3he7c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Glauber Costa 提交于
This series reworks our current object cache shrinking infrastructure in two main ways: * Noticing that a lot of users copy and paste their own version of LRU lists for objects, we put some effort in providing a generic version. It is modeled after the filesystem users: dentries, inodes, and xfs (for various tasks), but we expect that other users could benefit in the near future with little or no modification. Let us know if you have any issues. * The underlying list_lru being proposed automatically and transparently keeps the elements in per-node lists, and is able to manipulate the node lists individually. Given this infrastructure, we are able to modify the up-to-now hammer called shrink_slab to proceed with node-reclaim instead of always searching memory from all over like it has been doing. Per-node lru lists are also expected to lead to less contention in the lru locks on multi-node scans, since we are now no longer fighting for a global lock. The locks usually disappear from the profilers with this change. Although we have no official benchmarks for this version - be our guest to independently evaluate this - earlier versions of this series were performance tested (details at http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no visible performance regressions while yielding a better qualitative behavior in NUMA machines. With this infrastructure in place, we can use the list_lru entry point to provide memcg isolation and per-memcg targeted reclaim. Historically, those two pieces of work have been posted together. This version presents only the infrastructure work, deferring the memcg work for a later time, so we can focus on getting this part tested. You can see more about the history of such work at http://lwn.net/Articles/552769/ Dave Chinner (18): dcache: convert dentry_stat.nr_unused to per-cpu counters dentry: move to per-sb LRU locks dcache: remove dentries from LRU before putting on dispose list mm: new shrinker API shrinker: convert superblock shrinkers to new API list: add a new LRU list type inode: convert inode lru list to generic lru list code. dcache: convert to use new lru list infrastructure list_lru: per-node list infrastructure shrinker: add node awareness fs: convert inode and dentry shrinking to be node aware xfs: convert buftarg LRU to generic code xfs: rework buffer dispose list tracking xfs: convert dquot cache lru to list_lru fs: convert fs shrinkers to new scan/count API drivers: convert shrinkers to new count/scan API shrinker: convert remaining shrinkers to count/scan API shrinker: Kill old ->shrink API. Glauber Costa (7): fs: bump inode and dentry counters to long super: fix calculation of shrinkable objects for small numbers list_lru: per-node API vmscan: per-node deferred work i915: bail out earlier when shrinker cannot acquire mutex hugepage: convert huge zero page shrinker to new shrinker API list_lru: dynamically adjust node arrays This patch: There are situations in very large machines in which we can have a large quantity of dirty inodes, unused dentries, etc. This is particularly true when umounting a filesystem, where eventually since every live object will eventually be discarded. Dave Chinner reported a problem with this while experimenting with the shrinker revamp patchset. So we believe it is time for a change. This patch just moves int to longs. Machines where it matters should have a big long anyway. Signed-off-by: NGlauber Costa <glommer@openvz.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Chinner <dchinner@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 9月, 2013 1 次提交
-
-
由 Joonsoo Kim 提交于
Commit 23f0d209 ("sched: Factor out code to should_we_balance()") introduces the should_we_balance() function. This function should return 1 if this cpu is appropriate for balancing. But the newly introduced code doesn't do so, it returns 0 instead of 1. This introduces performance regression, reported by Dave Chinner: v4 filesystem v5 filesystem 3.11+xfsdev: 220k files/s 225k files/s 3.12-git 180k files/s 185k files/s 3.12-git-revert 245k files/s 247k files/s You can find more detailed information at: https://lkml.org/lkml/2013/9/10/1 This patch corrects the return value of should_we_balance() function as orignally intended. With this patch, Dave Chinner reports that the regression is gone: v4 filesystem v5 filesystem 3.11+xfsdev: 220k files/s 225k files/s 3.12-git 180k files/s 185k files/s 3.12-git-revert 245k files/s 247k files/s 3.12-git-fix 249k files/s 248k files/s Reported-by: NDave Chinner <dchinner@redhat.com> Tested-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Link: http://lkml.kernel.org/r/20130910065448.GA20368@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-