- 13 9月, 2015 15 次提交
-
-
由 Morten Rasmussen 提交于
Bring arch_scale_cpu_capacity() in line with the recent change of its arch_scale_freq_capacity() sibling in commit dfbca41f ("sched: Optimize freq invariant accounting") from weak function to #define to allow inlining of the function. While at it, remove the ARCH_CAPACITY sched_feature as well. With the change to #define there isn't a straightforward way to allow runtime switch between an arch implementation and the default implementation of arch_scale_cpu_capacity() using sched_feature. The default was to use the arch-specific implementation, but only the arm architecture provides one and that is essentially equivalent to the default implementation. Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <Dietmar.Eggemann@arm.com> Cc: Juri Lelli <Juri.Lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: daniel.lezcano@linaro.org Cc: mturquette@baylibre.com Cc: pang.xunlei@zte.com.cn Cc: rjw@rjwysocki.net Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1439569394-11974-3-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dietmar Eggemann 提交于
Apply frequency scaling correction factor to per-entity load tracking to make it frequency invariant. Currently, load appears bigger when the CPU is running slower which affects load-balancing decisions. Each segment of the sched_avg.load_sum geometric series is now scaled by the current frequency so that the sched_avg.load_avg of each sched entity will be invariant from frequency scaling. Moreover, cfs_rq.runnable_load_sum is scaled by the current frequency as well. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NVincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <Dietmar.Eggemann@arm.com> Cc: Juri Lelli <Juri.Lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: daniel.lezcano@linaro.org Cc: mturquette@baylibre.com Cc: pang.xunlei@zte.com.cn Cc: rjw@rjwysocki.net Cc: sgurrappadi@nvidia.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1439569394-11974-2-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Variable sched_numa_balancing toggles numa_balancing feature. Hence moving from a simple read mostly variable to a more apt static_branch. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439310261-16124-1-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Variable sched_numa_balancing is available for both CONFIG_SCHED_DEBUG and !CONFIG_SCHED_DEBUG. All code paths now check for sched_numa_balancing. Hence remove sched_feat(NUMA). Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439290813-6683-4-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Commit 2a1ed24c ("sched/numa: Prefer NUMA hotness over cache hotness") sets sched feature NUMA to true. However this can enable NUMA hinting faults on a UMA system. This commit ensures that NUMA hinting faults occur only on a NUMA system by setting/resetting sched_numa_balancing. This commit: - Makes sched_numa_balancing common to CONFIG_SCHED_DEBUG and !CONFIG_SCHED_DEBUG. Earlier it was only in !CONFIG_SCHED_DEBUG. - Checks for sched_numa_balancing instead of sched_feat(NUMA). Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439290813-6683-3-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Simple rename of the 'numabalancing_enabled' variable to 'sched_numa_balancing'. No functional changes. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439290813-6683-2-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vincent Guittot 提交于
Since commit: d4573c3e ("sched: Improve load balancing in the presence of idle CPUs") the ILB CPU starts with the idle load balancing of other idle CPUs and finishes with itself in order to speed up the spread of tasks in all idle CPUs. The this_rq->next_balance is still used in nohz_idle_balance() as an intermediate step to gather the shortest next balance before updating nohz.next_balance. But the former has not been updated yet and is likely to be set with the current jiffies. As a result, the nohz.next_balance will be set with current jiffies instead of the real next balance date. This generates spurious kicks of nohz ilde balance. nohz_idle_balance() must set the nohz.next_balance without taking into account this_rq->next_balance which is not updated yet. Then, this_rq will update nohz.next_update with its next_balance once updated and if necessary. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NJason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1438595750-20455-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
cgroup_exit() is not called from copy_process() after commit: e8604cb4 ("cgroup: fix spurious lockdep warning in cgroup_exit()") from do_exit(). So this check is useless and the comment is obsolete. Signed-off-by: NKirill Tkhai <ktkhai@odin.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/55E444C8.3020402@odin.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The previous patches made the second argument go unused, remove it. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
By observing that switched_from_fair() detaches from a runqueue, and switched_to_fair() attaches to a runqueue, we can see that task_move_group_fair() is one followed by the other with flipping the runqueue in between. Therefore extract all the common bits and implement all three functions in terms of them. This should fix a few corner cases wrt. vruntime normalization; where, when we take a task off of a runqueue we convert to an approximation of lag by subtracting min_vruntime, and when placing a task on the a runqueue to the reverse. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NByungchul Park <byungchul.park@lge.com> [peterz: Changelog] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1440069720-27038-6-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In case there are problems with the aging on attach, provide a debug knob to turn it off. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: yuyang.du@intel.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
Where switched_from_fair() will remove the entity's load from the runqueue, switched_to_fair() does not currently add it back. This means that when a task leaves the fair class for a short duration; say because of PI; we loose its load contribution. This can ripple forward and disturb the load tracking because other operations (enqueue, dequeue) assume its factored in. Only once the runqueue empties will the load tracking recover. When we add it back in, age the per entity average to match up with the runqueue age. This has the obvious problem that if the task leaves the fair class for a significant time, the load will age to 0. Employ the normal migration rule for inter-runqueue moves in task_move_group_fair(). Again, there is the obvious problem of the task migrating while not in the fair class. The alternative solution would be to to omit the chunk in attach_entity_load_avg(), which would effectively reset the timestamp and use whatever avg there was. Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ Rewrote the changelog and comments. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1440069720-27038-5-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
Since we attach the entity load to the new runqueue, we should also detatch the entity load from the old runqueue, otherwise load can accumulate. Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ Rewrote the changelog. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1440069720-27038-4-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
Currently we conditionally add the entity load to the rq when moving the task between cgroups. This doesn't make sense as we always 'migrate' the task between cgroups, so we should always migrate the load too. [ The history here is that we used to only migrate the blocked load which was only meaningfull when !queued. ] Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ Rewrote the changelog. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1440069720-27038-3-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Byungchul Park 提交于
Currently we open-code the addition/subtraction of the per entity load to/from the runqueue, factor this out into helper functions. Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ Rewrote the changelog. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1440069720-27038-2-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 9月, 2015 2 次提交
-
-
由 Oleg Nesterov 提交于
check_for_tasks() doesn't need to disable irqs, recursive read_lock() from interrupt is fine. While at it, s/do_each_thread/for_each_process_thread/. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NKirill Tkhai <ktkhai@odin.com> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150910130750.GA20055@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathieu Desnoyers 提交于
Here is an implementation of a new system call, sys_membarrier(), which executes a memory barrier on all threads running on the system. It is implemented by calling synchronize_sched(). It can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of sys_membarrier() and a compiler barrier. For synchronization primitives that distinguish between read-side and write-side (e.g. userspace RCU [1], rwlocks), the read-side can be accelerated significantly by moving the bulk of the memory barrier overhead to the write-side. The existing applications of which I am aware that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf) - Financial software (https://lkml.org/lkml/2015/3/23/189) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. To explain the benefit of this scheme, let's introduce two example threads: Thread A (non-frequent, e.g. executing liburcu synchronize_rcu()) Thread B (frequent, e.g. executing liburcu rcu_read_lock()/rcu_read_unlock()) In a scheme where all smp_mb() in thread A are ordering memory accesses with respect to smp_mb() present in Thread B, we can change each smp_mb() within Thread A into calls to sys_membarrier() and each smp_mb() within Thread B into compiler barriers "barrier()". Before the change, we had, for each smp_mb() pairs: Thread A Thread B previous mem accesses previous mem accesses smp_mb() smp_mb() following mem accesses following mem accesses After the change, these pairs become: Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses As we can see, there are two possible scenarios: either Thread B memory accesses do not happen concurrently with Thread A accesses (1), or they do (2). 1) Non-concurrent Thread A vs Thread B accesses: Thread A Thread B prev mem accesses sys_membarrier() follow mem accesses prev mem accesses barrier() follow mem accesses In this case, thread B accesses will be weakly ordered. This is OK, because at that point, thread A is not particularly interested in ordering them with respect to its own accesses. 2) Concurrent Thread A vs Thread B accesses Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses In this case, thread B accesses, which are ensured to be in program order thanks to the compiler barrier, will be "upgraded" to full smp_mb() by synchronize_sched(). * Benchmarks On Intel Xeon E5405 (8 cores) (one thread is calling sys_membarrier, the other 7 threads are busy looping) 1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call. * User-space user of this system call: Userspace RCU library Both the signal-based and the sys_membarrier userspace RCU schemes permit us to remove the memory barrier from the userspace RCU rcu_read_lock() and rcu_read_unlock() primitives, thus significantly accelerating them. These memory barriers are replaced by compiler barriers on the read-side, and all matching memory barriers on the write-side are turned into an invocation of a memory barrier on all active threads in the process. By letting the kernel perform this synchronization rather than dumbly sending a signal to every process threads (as we currently do), we diminish the number of unnecessary wake ups and only issue the memory barriers on active threads. Non-running threads do not need to execute such barrier anyway, because these are implied by the scheduler context switches. Results in liburcu: Operations in 10s, 6 readers, 2 writers: memory barriers in reader: 1701557485 reads, 2202847 writes signal-based scheme: 9830061167 reads, 6700 writes sys_membarrier: 9952759104 reads, 425 writes sys_membarrier (dyn. check): 7970328887 reads, 425 writes The dynamic sys_membarrier availability check adds some overhead to the read-side compared to the signal-based scheme, but besides that, sys_membarrier slightly outperforms the signal-based scheme. However, this non-expedited sys_membarrier implementation has a much slower grace period than signal and memory barrier schemes. Besides diminishing the number of wake-ups, one major advantage of the membarrier system call over the signal-based scheme is that it does not need to reserve a signal. This plays much more nicely with libraries, and with processes injected into for tracing purposes, for which we cannot expect that signals will be unused by the application. An expedited version of this system call can be added later on to speed up the grace period. Its implementation will likely depend on reading the cpu_curr()->mm without holding each CPU's rq lock. This patch adds the system call to x86 and to asm-generic. [1] http://urcu.so membarrier(2) man page: MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) NAME membarrier - issue memory barriers on a set of threads SYNOPSIS #include <linux/membarrier.h> int membarrier(int cmd, int flags); DESCRIPTION The cmd argument is one of the following: MEMBARRIER_CMD_QUERY Query the set of supported commands. It returns a bitmask of supported commands. MEMBARRIER_CMD_SHARED Execute a memory barrier on all threads running on the system. Upon return from system call, the caller thread is ensured that all running threads have passed through a state where all memory accesses to user-space addresses match program order between entry to and return from the system call (non-running threads are de facto in such a state). This covers threads from all pro=E2=80=90 cesses running on the system. This command returns 0. The flags argument needs to be 0. For future extensions. All memory accesses performed in program order from each targeted thread is guaranteed to be ordered with respect to sys_membarrier(). If we use the semantic "barrier()" to represent a compiler barrier forcing memory accesses to be performed in program order across the barrier, and smp_mb() to represent explicit memory barriers forcing full memory ordering across the barrier, we have the following ordering table for each pair of barrier(), sys_membarrier() and smp_mb(): The pair ordering is detailed as (O: ordered, X: not ordered): barrier() smp_mb() sys_membarrier() barrier() X X O smp_mb() X O O sys_membarrier() O O O RETURN VALUE On success, these system calls return zero. On error, -1 is returned, and errno is set appropriately. For a given command, with flags argument set to 0, this system call is guaranteed to always return the same value until reboot. ERRORS ENOSYS System call is not implemented. EINVAL Invalid arguments. Linux 2015-04-15 MEMBARRIER(2) Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nicholas Miell <nmiell@comcast.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 9月, 2015 15 次提交
-
-
由 Wanpeng Li 提交于
Kernel testing triggered this warning: | WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80() | Modules linked in: | CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c73 #2 | Call Trace: | dump_stack+0x4b/0x75 | warn_slowpath_common+0x8b/0xc0 | warn_slowpath_null+0x22/0x30 | do_set_cpus_allowed+0x7e/0x80 | cpuset_cpus_allowed_fallback+0x7c/0x170 | select_fallback_rq+0x221/0x280 | migration_call+0xe3/0x250 | notifier_call_chain+0x53/0x70 | __raw_notifier_call_chain+0x1e/0x30 | cpu_notify+0x28/0x50 | take_cpu_down+0x22/0x40 | multi_cpu_stop+0xd5/0x140 | cpu_stopper_thread+0xbc/0x170 | smpboot_thread_fn+0x174/0x2f0 | kthread+0xc4/0xe0 | ret_from_kernel_thread+0x21/0x30 As Peterz pointed out: | So the normal rules for changing task_struct::cpus_allowed are holding | both pi_lock and rq->lock, such that holding either stabilizes the mask. | | This is so that wakeup can happen without rq->lock and load-balance | without pi_lock. | | From this we already get the relaxation that we can omit acquiring | rq->lock if the task is not on the rq, because in that case | load-balancing will not apply to it. | | ** these are the rules currently tested in do_set_cpus_allowed() ** | | Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which | unconditionally acquires both locks, we could get away with holding just | rq->lock when on_rq for modification because that'd still exclude | __set_cpus_allowed_ptr(), it would also work against | __kthread_bind_mask() because that assumes !on_rq. | | That said, this is all somewhat fragile. | | Now, I don't think dropping rq->lock is quite as disastrous as it | usually is because !cpu_active at this point, which means load-balance | will not interfere, but that too is somewhat fragile. | | So we end up with a choice of two fragile.. This patch fixes it by following the rules for changing task_struct::cpus_allowed with both pi_lock and rq->lock held. Reported-by: Nkernel test robot <ying.huang@intel.com> Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> [ Modified changelog and patch. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/BLU436-SMTP1660820490DE202E3934ED3806E0@phx.gblSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ilya Dryomov 提交于
The following if (val < 0) *lvalp = (unsigned long)-val; is incorrect because the compiler is free to assume -val to be positive and use a sign-extend instruction for extending the bit pattern. This is a problem if val == INT_MIN: # echo -2147483648 >/proc/sys/dev/scsi/logging_level # cat /proc/sys/dev/scsi/logging_level -18446744071562067968 Cast to unsigned long before negation - that way we first sign-extend and then negate an unsigned, which is well defined. With this: # cat /proc/sys/dev/scsi/logging_level -2147483648 Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Cc: Mikulas Patocka <mikulas@twibright.com> Cc: Robert Xiao <nneonneo@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
In x86_64, since v2.6.26 the KERNEL_IMAGE_SIZE is changed to 512M, and accordingly the MODULES_VADDR is changed to 0xffffffffa0000000. However, in v3.12 Kees Cook introduced kaslr to randomise the location of kernel. And the kernel text mapping addr space is enlarged from 512M to 1G. That means now KERNEL_IMAGE_SIZE is variable, its value is 512M when kaslr support is not compiled in and 1G when kaslr support is compiled in. Accordingly the MODULES_VADDR is changed too to be: #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) So when kaslr is compiled in and enabled, the kernel text mapping addr space and modules vaddr space need be adjusted. Otherwise makedumpfile will collapse since the addr for some symbols is not correct. Hence KERNEL_IMAGE_SIZE need be exported to vmcoreinfo and got in makedumpfile to help calculate MODULES_VADDR. Signed-off-by: NBaoquan He <bhe@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
People reported that crash_notes in /proc/vmcore were corrupted and this cause crash kdump failure. With code debugging and log we got the root cause. This is because percpu variable crash_notes are allocated in 2 vmalloc pages. Currently percpu is based on vmalloc by default. Vmalloc can't guarantee 2 continuous vmalloc pages are also on 2 continuous physical pages. So when 1st kernel exports the starting address and size of crash_notes through sysfs like below: /sys/devices/system/cpu/cpux/crash_notes /sys/devices/system/cpu/cpux/crash_notes_size kdump kernel use them to get the content of crash_notes. However the 2nd part may not be in the next neighbouring physical page as we expected if crash_notes are allocated accross 2 vmalloc pages. That's why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in update_note_header_size_elf64() and cause note header merging failure or some warnings. In this patch change to call __alloc_percpu() to passed in the align value by rounding crash_notes_size up to the nearest power of two. This makes sure the crash_notes is allocated inside one physical page since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE. Meanwhile add a BUILD_BUG_ON to break compile if size is bigger than PAGE_SIZE since crash_notes definitely will be in 2 pages. That need be avoided, and need be reported if it's unavoidable. [akpm@linux-foundation.org: use correct comment layout] Signed-off-by: NBaoquan He <bhe@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minfei Huang 提交于
Transforming PFN(Page Frame Number) to struct page is never failure, so we can simplify the code logic to do the image->control_page assignment directly in the loop, and remove the unnecessary conditional judgement. Signed-off-by: NMinfei Huang <mnfhuang@gmail.com> Acked-by: NDave Young <dyoung@redhat.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Young 提交于
There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NDave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Young 提交于
Split kexec_file syscall related code to another file kernel/kexec_file.c so that the #ifdef CONFIG_KEXEC_FILE in kexec.c can be dropped. Sharing variables and functions are moved to kernel/kexec_internal.h per suggestion from Vivek and Petr. [akpm@linux-foundation.org: fix bisectability] [akpm@linux-foundation.org: declare the various arch_kexec functions] [akpm@linux-foundation.org: fix build] Signed-off-by: NDave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Frederic Weisbecker 提交于
The UMH_WAIT_PROC handler runs in its own thread in order to make sure that waiting for the exec kernel thread completion won't block other usermodehelper queued jobs. On older workqueue implementations, worklets couldn't sleep without blocking the rest of the queue. But now the workqueue subsystem handles that. Khelper still had the older limitation due to its singlethread properties but we replaced it to system unbound workqueues. Those are affine to the current node and can block up to some number of instances. They are a good candidate to handle UMH_WAIT_PROC assuming that we have enough system unbound workers to handle lots of parallel usermodehelper jobs. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Frederic Weisbecker 提交于
We need to launch the usermodehelper kernel threads with the widest affinity and this is partly why we use khelper. This workqueue has unbound properties and thus a wide affinity inherited by all its children. Now khelper also has special properties that we aren't much interested in: ordered and singlethread. There is really no need about ordering as all we do is creating kernel threads. This can be done concurrently. And singlethread is a useless limitation as well. The workqueue engine already proposes generic unbound workqueues that don't share these useless properties and handle well parallel jobs. The only worrysome specific is their affinity to the node of the current CPU. It's fine for creating the usermodehelper kernel threads but those inherit this affinity for longer jobs such as requesting modules. This patch proposes to use these node affine unbound workqueues assuming that a node is sufficient to handle several parallel usermodehelper requests. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Frederic Weisbecker 提交于
There seem to be quite some confusions on the comments, likely due to changes that came after them. Now since it's very non obvious why we have 3 levels of asynchronous code to implement usermodehelpers, it's important to comment in detail the reason of this layout. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Frederic Weisbecker 提交于
Khelper is affine to all CPUs. Now since it creates the call_usermodehelper_exec_[a]sync() kernel threads, those inherit the wide affinity. As such explicitly forcing a wide affinity from those kernel threads is like a no-op. Just remove it. It's needless and it breaks CPU isolation users who rely on workqueue affinity tuning. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Frederic Weisbecker 提交于
This patchset does a bunch of cleanups and converts khelper to use system unbound workqueues. The 3 first patches should be uncontroversial. The last 2 patches are debatable. Kmod creates kernel threads that perform userspace jobs and we want those to have a large affinity in order not to contend busy CPUs. This is (partly) why we use khelper which has a wide affinity that the kernel threads it create can inherit from. Now khelper is a dedicated workqueue that has singlethread properties which we aren't interested in. Hence those two debatable changes: _ We would like to use generic workqueues. System unbound workqueues are a very good candidate but they are not wide affine, only node affine. Now probably a node is enough to perform many parallel kmod jobs. _ We would like to remove the wait_for_helper kernel thread (UMH_WAIT_PROC handler) to use the workqueue. It means that if the workqueue blocks, and no other worker can take pending kmod request, we can be screwed. Now if we have 512 threads, this should be enough. This patch (of 5): Underscores on function names aren't much verbose to explain the purpose of a function. And kmod has interesting such flavours. Lets rename the following functions: * __call_usermodehelper -> call_usermodehelper_exec_work * ____call_usermodehelper -> call_usermodehelper_exec_async * wait_for_helper -> call_usermodehelper_exec_sync Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 NeilBrown 提交于
If request_module() successfully runs modprobe, but modprobe exits with a non-zero status, then the return value from request_module() will be that (positive) error status. So the return from request_module can be: negative errno zero for success positive exit code. Signed-off-by: NNeilBrown <neilb@suse.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Commit e0e81739 ("CRED: Add some configurable debugging [try #6]") added the kdebug mechanism to this file back in 2009. The kdebug macro calls no_printk which always evaluates arguments. Most of the kdebug uses have an unnecessary call of atomic_read(&cred->usage) Make the kdebug macro do nothing by defining it with do { if (0) no_printk(...); } while (0) when not enabled. $ size kernel/cred.o* (defconfig x86-64) text data bss dec hex filename 2748 336 8 3092 c14 kernel/cred.o.new 2788 336 8 3132 c3c kernel/cred.o.old Miscellanea: o Neaten the #define kdebug macros while there Signed-off-by: NJoe Perches <joe@perches.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yongjun 提交于
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 9月, 2015 2 次提交
-
-
由 Alexei Starovoitov 提交于
when the verifier log is enabled the print_bpf_insn() is doing bpf_alu_string[BPF_OP(insn->code) >> 4] and bpf_jmp_string[BPF_OP(insn->code) >> 4] where BPF_OP is a 4-bit instruction opcode. Malformed insns can cause out of bounds access. Fix it by sizing arrays appropriately. The bug was found by clang address sanitizer with libfuzzer. Reported-by: NYonghong Song <yhs@plumgrid.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
We may already have gotten a proper fd struct through fdget(), so whenever we return at the end of an map operation, we need to call fdput(). However, each map operation from syscall side first probes CHECK_ATTR() to verify that unused fields in the bpf_attr union are zero. In case of malformed input, we return with error, but the lookup to the map_fd was already performed at that time, so that we return without an corresponding fdput(). Fix it by performing an fdget() only right before bpf_map_get(). The fdget() invocation on maps in the verifier is not affected. Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 9月, 2015 2 次提交
-
-
由 Vlastimil Babka 提交于
alloc_pages_exact_node() was introduced in commit 6484eb3e ("page allocator: do not check NUMA node ID when the caller knows the node is valid") as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see for example commits 5265047a ("mm, thp: really limit transparent hugepage allocation to local node") and b360edb4 ("mm, mempolicy: migrate_to_node should only migrate to node"). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with the exception of sba_alloc_coherent() which open-codes the check for NUMA_NO_NODE, so it is converted to use alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NRobin Holt <robinmholt@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cliff Whickman <cpw@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
When seq_show_option (commit a068acf2: "fs: create and use seq_show_option for escaping") was merged, it did not correctly collide with cgroup's addition of legacy_name (commit 3e1d2eed: "cgroup: introduce cgroup_subsys->legacy_name") changes. This fixes the reported name. Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 9月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
In commit f341861f ("task_work: add a scheduling point in task_work_run()") I fixed a latency problem adding a cond_resched() call. Later, commit ac3d0da8 added yet another loop to reverse a list, bringing back the latency spike : I've seen in some cases this loop taking 275 ms, if for example a process with 2,000,000 files is killed. We could add yet another cond_resched() in the reverse loop, or we can simply remove the reversal, as I do not think anything would depend on order of task_work_add() submitted works. Fixes: ac3d0da8 ("task_work: Make task_work_add() lockless") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NMaciej Żenczykowski <maze@google.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2015 3 次提交
-
-
由 Andrea Arcangeli 提交于
This activates the userfaultfd syscall. [sfr@canb.auug.org.au: activate syscall fix] [akpm@linux-foundation.org: don't enable userfaultfd on powerpc] Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
These two flags gets set in vma->vm_flags to tell the VM common code if the userfaultfd is armed and in which mode (only tracking missing faults, only tracking wrprotect faults or both). If neither flags is set it means the userfaultfd is not armed on the vma. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
This adds the vm_userfaultfd_ctx to the vm_area_struct. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-