- 16 10月, 2013 2 次提交
-
-
由 Peter Zijlstra 提交于
There is a subtle race in migrate_swap, when task P, on CPU A, decides to swap places with task T, on CPU B. Task P: - call migrate_swap Task T: - go to sleep, removing itself from the runqueue Task P: - double lock the runqueues on CPU A & B Task T: - get woken up, place itself on the runqueue of CPU C Task P: - see that task T is on a runqueue, and pretend to remove it from the runqueue on CPU B Now CPUs B & C both have corrupted scheduler data structures. This patch fixes it, by holding the pi_lock for both of the tasks involved in the migrate swap. This prevents task T from waking up, and placing itself onto another runqueue, until after migrate_swap has released all locks. This means that, when migrate_swap checks, task T will be either on the runqueue where it was originally seen, or not on any runqueue at all. Migrate_swap deals correctly with of those cases. Tested-by: NJoe Mario <jmario@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: hannes@cmpxchg.org Cc: aarcange@redhat.com Cc: srikar@linux.vnet.ibm.com Cc: tglx@linutronix.de Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/20131010181722.GO13848@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
While discussing the proposed SCHED_DEADLINE patches which in parts mimic the existing FIFO code it was noticed that the wmb in rt_set_overloaded() didn't have a matching barrier. The only site using rt_overloaded() to test the rto_count is pull_rt_task() and we should issue a matching rmb before then assuming there's an rto_mask bit set. Without that smp_rmb() in there we could actually miss seeing the rto_mask bit. Also, change to using smp_[wr]mb(), even though this is SMP only code; memory barriers without smp_ always make me think they're against hardware of some sort. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: vincent.guittot@linaro.org Cc: luca.abeni@unitn.it Cc: bruce.ashfield@windriver.com Cc: dhaval.giani@gmail.com Cc: rostedt@goodmis.org Cc: hgu1972@gmail.com Cc: oleg@redhat.com Cc: fweisbec@gmail.com Cc: darren@dvhart.com Cc: johan.eker@ericsson.com Cc: p.faure@akatech.ch Cc: paulmck@linux.vnet.ibm.com Cc: raistlin@linux.it Cc: claudio@evidence.eu.com Cc: insop.song@gmail.com Cc: michael@amarulasolutions.com Cc: liming.wang@windriver.com Cc: fchecconi@gmail.com Cc: jkacur@redhat.com Cc: tommaso.cucinotta@sssup.it Cc: Juri Lelli <juri.lelli@gmail.com> Cc: harald.gustafsson@ericsson.com Cc: nicola.manica@disi.unitn.it Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 10月, 2013 1 次提交
-
-
由 Kamalesh Babulal 提交于
- 'load_icx' => 'load_idx' - 'calculcate_imbalance' => 'calculate_imbalance' Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1381685775-3544-1-git-send-email-kamalesh@linux.vnet.ibm.com [ Also, don't capitalize 'idle' unnecessarily. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 10月, 2013 1 次提交
-
-
由 Ramkumar Ramachandra 提交于
The balance parameter was removed by 23f0d209 ("sched: Factor out code to should_we_balance()", 2013-08-06). Signed-off-by: NRamkumar Ramachandra <artagnon@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381400433-2030-1-git-send-email-artagnon@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 10月, 2013 14 次提交
-
-
由 Ingo Molnar 提交于
Apply the asm_volatile_goto() compiler quirk to the new rmwcc.h file as well, introduced in: c2daa3be sched, x86: Provide a per-cpu preempt_count implementation Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com> Reported-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: NJakub Jelinek <jakub@redhat.com> Reviewed-by: NRichard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix. Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com> Reported-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: NJakub Jelinek <jakub@redhat.com> Reviewed-by: NRichard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kent Overstreet 提交于
Commit c0f04d88 ("bcache: Fix flushes in writeback mode") was fixing a reported data corruption bug, but it seems some last minute refactoring or rebasing introduced a null pointer deref. Signed-off-by: NKent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Reported-by: NGabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging由 Linus Torvalds 提交于
Pull hwmon fix from Guenter Roeck: "Fix root cause of crash/error seen in applesmc driver" * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (applesmc) Always read until end of data
-
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild由 Linus Torvalds 提交于
Pull kbuild fix from Michal Marek: "Here is an ARM Makefile fix that you even acked. After nobody wanted to take it, it ended up in the kbuild tree" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: arm, kbuild: make "make install" not depend on vmlinux
-
git://www.linux-watchdog.org/linux-watchdog由 Linus Torvalds 提交于
Pull watchdog fix from Wim Van Sebroeck: "Make sure that the hpwdt driver will not load auxilary iLO devices" * git://www.linux-watchdog.org/linux-watchdog: watchdog: hpwdt: Patch to ignore auxilary iLO devices
-
由 Fengguang Wu 提交于
Useful for locating buggy drivers on kernel oops. It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is hopefully only enabled in debug kernels (like maybe the Fedora rawhide one, or at developers), so being a bit more verbose is likely ok. Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mingarelli, Thomas 提交于
This patch is to prevent hpwdt from loading on any auxilary iLO devices defined after the initial (or main) iLO device. All auxilary iLO devices will have a subsystem device ID set to 0x1979 in order for hpwdt to differentiate between the two types. Signed-off-by: NThomas Mingarelli <thomas.mingarelli@hp.com> Tested-by: NLisa Mitchell <lisa.mitchell@hp.com> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random由 Linus Torvalds 提交于
Pull /dev/random changes from Ted Ts'o: "These patches are designed to enable improvements to /dev/random for non-x86 platforms, in particular MIPS and ARM" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: allow architectures to optionally define random_get_entropy() random: run random_int_secret_init() run after all late_initcalls
-
git://git.kernel.org/pub/scm/virt/kvm/kvm由 Linus Torvalds 提交于
Pull kvm fixes from Paolo Bonzini: "Fixes for 3.12-rc5: two old PPC bugs and one new (3.12-rc2) x86 bug" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: ppc: booke: check range page invalidation progress on page setup KVM: PPC: Book3S HV: Fix typo in saving DSCR KVM: nVMX: fix shadow on EPT
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi由 Linus Torvalds 提交于
Pull spi fixes from Mark Brown: "This is all driver updates, mostly fixes for error handling paths except for the s3c64xx and hspi fixes for trying to use runtime PM before it is enabled and the pxa2xx fix for interactions between power management and interrupt handling" * tag 'spi-v3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: atmel: Fix incorrect error path spi/hspi: fixup Runtime PM enable timing spi/s3c64xx: Ensure runtime PM is enabled prior to registration spi/clps711x: drop clk_put for devm_clk_get in spi_clps711x_probe() spi: fix return value check in dspi_probe() spi: mpc512x: fix error return code in mpc512x_psc_spi_do_probe() spi: clps711x: Don't call kfree() after spi_master_put/spi_unregister_master spi/pxa2xx: check status register as well to determine if the device is off
-
由 Theodore Ts'o 提交于
Allow architectures which have a disabled get_cycles() function to provide a random_get_entropy() function which provides a fine-grained, rapidly changing counter that can be used by the /dev/random driver. For example, an architecture might have a rapidly changing register used to control random TLB cache eviction, or DRAM refresh that doesn't meet the requirements of get_cycles(), but which is good enough for the needs of the random driver. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
git://git.infradead.org/linux-mtd由 Linus Torvalds 提交于
Pull MTD fixes from Brian Norris: - fix a small memory leak in some new ONFI code - account for additional odd variations of Micron SPI flash Acked by David Woodhouse. * tag 'for-linus-20131008' of git://git.infradead.org/linux-mtd: mtd: m25p80: Fix 4 byte addressing mode for Micron devices. mtd: nand: fix memory leak in ONFI extended parameter page
-
- 10 10月, 2013 4 次提交
-
-
由 Bharat Bhushan 提交于
When the MM code is invalidating a range of pages, it calls the KVM kvm_mmu_notifier_invalidate_range_start() notifier function, which calls kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages. However, the Linux PTEs for the range being flushed are still valid at that point. We are not supposed to establish any new references to pages in the range until the ...range_end() notifier gets called. The PPC-specific KVM code doesn't get any explicit notification of that; instead, we are supposed to use mmu_notifier_retry() to test whether we are or have been inside a range flush notifier pair while we have been referencing a page. This patch calls the mmu_notifier_retry() while mapping the guest page to ensure we are not referencing a page when in range invalidation. This call is inside a region locked with kvm->mmu_lock, which is the same lock that is called by the KVM MMU notifier functions, thus ensuring that no new notification can proceed while we are in the locked region. Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com> Acked-by: NAlexander Graf <agraf@suse.de> [Backported to 3.12 - Paolo] Reviewed-by: NBharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paul Mackerras 提交于
This fixes a typo in the code that saves the guest DSCR (Data Stream Control Register) into the kvm_vcpu_arch struct on guest exit. The effect of the typo was that the DSCR value was saved in the wrong place, so changes to the DSCR by the guest didn't persist across guest exit and entry, and some host kernel memory got corrupted. Cc: stable@vger.kernel.org [v3.1+] Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gleb Natapov 提交于
72f85795 broke shadow on EPT. This patch reverts it and fixes PAE on nEPT (which reverted commit fixed) in other way. Shadow on EPT is now broken because while L1 builds shadow page table for L2 (which is PAE while L2 is in real mode) it never loads L2's GUEST_PDPTR[0-3]. They do not need to be loaded because without nested virtualization HW does this during guest entry if EPT is disabled, but in our case L0 emulates L2's vmentry while EPT is enables, so we cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3() is doing, but by clearing cache bits during L2 vmentry we drop values that kvm_set_cr3() read from memory. So why the same code does not work for PAE on nEPT? kvm_set_cr3() reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs() uses vcpu->arch.mmu which contain incorrect values. Fix that by using walk_mmu in ept_(load|save)_pdptrs. Signed-off-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Tested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Henrik Rydberg 提交于
The crash reported and investigated in commit 5f4513 turned out to be caused by a change to the read interface on newer (2012) SMCs. Tests by Chris show that simply reading the data valid line is enough for the problem to go away. Additional tests show that the newer SMCs no longer wait for the number of requested bytes, but start sending data right away. Apparently the number of bytes to read is no longer specified as before, but instead found out by reading until end of data. Failure to read until end of data confuses the state machine, which eventually causes the crash. As a remedy, assuming bit0 is the read valid line, make sure there is nothing more to read before leaving the read function. Tested to resolve the original problem, and runtested on MBA3,1, MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on machines before 2012. Tested-by: NChris Murphy <chris@cmurf.com> Cc: stable@vger.kernel.org Signed-off-by: NHenrik Rydberg <rydberg@euromail.se> Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
-
- 09 10月, 2013 18 次提交
-
-
由 Peter Zijlstra 提交于
Reflow the function a bit because GCC gets confused: kernel/sched/fair.c: In function ‘task_numa_fault’: kernel/sched/fair.c:1448:3: warning: ‘my_grp’ may be used uninitialized in this function [-Wmaybe-uninitialized] kernel/sched/fair.c:1463:27: note: ‘my_grp’ was declared here Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-6ebt6x7u64pbbonq1khqu2z9@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Short spikes of CPU load can lead to a task being migrated away from its preferred node for temporary reasons. It is important that the task is migrated back to where it belongs, in order to avoid migrating too much memory to its new location, and generally disturbing a task's NUMA location. This patch fixes NUMA placement for 4 specjbb instances on a 4 node system. Without this patch, things take longer to converge, and processes are not always completely on their own node. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-64-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mel Gorman 提交于
As Peter says "If you're going to hold locks you can also do away with all that atomic_long_*() nonsense". Lock aquisition moved slightly to protect the updates. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-63-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Shared faults can lead to lots of unnecessary page migrations, slowing down the system, and causing private faults to hit the per-pgdat migration ratelimit. This patch adds sysctl numa_balancing_migrate_deferred, which specifies how many shared page migrations to skip unconditionally, after each page migration that is skipped because it is a shared fault. This reduces the number of page migrations back and forth in shared fault situations. It also gives a strong preference to the tasks that are already running where most of the memory is, and to moving the other tasks to near the memory. Testing this with a much higher scan rate than the default still seems to result in fewer page migrations than before. Memory seems to be somewhat better consolidated than previously, with multi-instance specjbb runs on a 4 node system. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
With the scan rate code working (at least for multi-instance specjbb), the large hammer that is "sched: Do not migrate memory immediately after switching node" can be replaced with something smarter. Revert temporarily migration disabling and all traces of numa_migrate_seq. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-61-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mel Gorman 提交于
With scan rate adaptions based on whether the workload has properly converged or not there should be no need for the scan period reset hammer. Get rid of it. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Adjust numa_scan_period in task_numa_placement, depending on how much useful work the numa code can do. The more local faults there are in a given scan window the longer the period (and hence the slower the scan rate) during the next window. If there are excessive shared faults then the scan period will decrease with the amount of scaling depending on whether the ratio of shared/private faults. If the preferred node changes then the scan rate is reset to recheck if the task is properly placed. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-59-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mel Gorman 提交于
Scan rate is altered based on whether shared/private faults dominated. task_numa_group() may detect false sharing but that information is not taken into account when adapting the scan rate. Take it into account. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-58-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Due to the way the pid is truncated, and tasks are moved between CPUs by the scheduler, it is possible for the current task_numa_fault to group together tasks that do not actually share memory together. This patch adds a few easy sanity checks to task_numa_fault, joining tasks together if they share the same tsk->mm, or if the fault was on a page with an elevated mapcount, in a shared VMA. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-57-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
This patch classifies scheduler domains and runqueues into types depending the number of tasks that are about their NUMA placement and the number that are currently running on their preferred node. The types are regular: There are tasks running that do not care about their NUMA placement. remote: There are tasks running that care about their placement but are currently running on a node remote to their ideal placement all: No distinction To implement this the patch tracks the number of tasks that are optimally NUMA placed (rq->nr_preferred_running) and the number of tasks running that care about their placement (nr_numa_running). The load balancer uses this information to avoid migrating idea placed NUMA tasks as long as better options for load balancing exists. For example, it will not consider balancing between a group whose tasks are all perfectly placed and a group with remote tasks. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1381141781-10992-56-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
This patch separately considers task and group affinities when searching for swap candidates during NUMA placement. If tasks are part of the same group, or no group at all, the task weights are considered. Some hysteresis is added to prevent tasks within one group from getting bounced between NUMA nodes due to tiny differences. If tasks are part of different groups, the code compares group weights, in order to favor grouping task groups together. The patch also changes the group weight multiplier to be the same as the task weight multiplier, since the two are no longer added up like before. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-55-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
This patch separately considers task and group affinities when searching for swap candidates during task NUMA placement. If tasks are not part of a group or the same group then the task weights are considered. Otherwise the group weights are compared. Signed-off-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-54-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Signed-off-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1381141781-10992-53-git-send-email-mgorman@suse.de
-
由 Mel Gorman 提交于
Having multiple tasks in a group go through task_numa_placement simultaneously can lead to a task picking a wrong node to run on, because the group stats may be in the middle of an update. This patch avoids parallel updates by holding the numa_group lock during placement decisions. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-52-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
It is possible for a task in a numa group to call exec, and have the new (unrelated) executable inherit the numa group association from its former self. This has the potential to break numa grouping, and is trivial to fix. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mel Gorman 提交于
This patch uses the fraction of faults on a particular node for both task and group, to figure out the best node to place a task. If the task and group statistics disagree on what the preferred node should be then a full rescan will select the node with the best combined weight. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-50-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
A newly spawned thread inside a process should stay on the same NUMA node as its parent. This prevents processes from being "torn" across multiple NUMA nodes every time they spawn a new thread. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-49-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mel Gorman 提交于
With the THP migration races closed it is still possible to occasionally see corruption. The problem is related to handling PMD pages in batch. When a page fault is handled it can be assumed that the page being faulted will also be flushed from the TLB. The same flushing does not happen when handling PMD pages in batch. Fixing is straight forward but there are a number of reasons not to 1. Multiple TLB flushes may have to be sent depending on what pages get migrated 2. The handling of PMDs in batch means that faults get accounted to the task that is handling the fault. While care is taken to only mark PMDs where the last CPU and PID match it can still have problems due to PID truncation when matching PIDs. 3. Batching on the PMD level may reduce faults but setting pmd_numa requires taking a heavy lock that can contend with THP migration and handling the fault requires the release/acquisition of the PTL for every page migrated. It's still pretty heavy. PMD batch handling is not something that people ever have been happy with. This patch removes it and later patches will deal with the additional fault overhead using more installigent migrate rate adaption. Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-48-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-