- 16 12月, 2009 40 次提交
-
-
由 Wu Fengguang 提交于
So that an outside user can free the reference count grabbed by try_get_mem_cgroup_from_page(). CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> CC: Hugh Dickins <hugh.dickins@tiscali.co.uk> CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> CC: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Wu Fengguang 提交于
So that the hwpoison injector can get mem_cgroup for arbitrary page and thus know whether it is owned by some mem_cgroup task(s). [AK: Merged with latest git tree] CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> CC: Hugh Dickins <hugh.dickins@tiscali.co.uk> CC: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> CC: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Wu Fengguang 提交于
Rename get_uflags() to stable_page_flags() and make it a global function for use in the hwpoison page flags filter, which need to compare user page flags with the value provided by user space. Also move KPF_* to kernel-page-flags.h for use by user space tools. Acked-by: NMatt Mackall <mpm@selenic.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> CC: Nick Piggin <npiggin@suse.de> CC: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Wu Fengguang 提交于
The unpoisoning interface is useful for stress testing tools to reclaim poisoned pages (to prevent OOM) There is no hardware level unpoisioning, so this cannot be used for real memory errors, only for software injected errors. Note that it may leak pages silently - those who have been removed from LRU cache, but not isolated from page cache/swap cache at hwpoison time. Especially the stress test of dirty swap cache pages shall reboot system before exhausting memory. AK: Fix comments, add documentation, add printks, rename symbol Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Andi Kleen 提交于
Now that "ref" is just a boolean turn it into a flags argument. First step is only a single flag that makes the code's intention more clear, but more may follow. Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Andi Kleen 提交于
shake_page handles more types of page caches than lru_drain_all() - per cpu page allocator pages - per CPU LRU Stops early when the page became free. Used in followon patches. Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Samu Onkalo 提交于
Implement selftest feature as specified by chip manufacturer. Control: read selftest sysfs entry Response: "OK x y z" or "FAIL x y z" where x, y, and z are difference between selftest mode and normal mode. Test is passed when values are within acceptance limit values. Acceptance limits are provided via platform data. See chip spesifications for acceptance limits. If limits are not properly set, OK / FAIL decision is meaningless. However, userspace application can still make decision based on the numeric x, y, z values. Selftest is meant for HW diagnostic purposes. It is not meant to be called during normal use of the chip. It may cause false interrupt events. Selftest mode delays polling of the normal results but it doesn't cause wrong values. Chip must be in static state during selftest. Any acceration during the test causes most probably failure. Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com> Acked-by: NÉric Piel <Eric.Piel@tremplin-utc.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Samu Onkalo 提交于
Add the possibility to remap axes via platform data. Function pointers for resource setup and release purposes Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com> Acked-by: NÉric Piel <eric.piel@tremplin-utc.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Trisal, Kalhan" <kalhan.trisal@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicolas Ferre 提交于
Allow the use of another DMA controller driver in atmel-mci sd/mmc driver. This adds a generic dma_slave pointer to the mci platform structure where we can store DMA controller information. In atmel-mci we use information provided by this structure to initialize the driver (with new helper functions that are architecture dependant). This also adds at32/avr32 chip modifications to cope with this new access method. Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
Recently, We marked strstrip() as must_check. because it was frequently misused and it should be checked. However, we found one exception. scsi/ipr.c intentionally ignore return value of strstrip. Because it wishes to keep the whitespace at the beginning. Thus we need to keep with and without checked whitespace trim function. This patch adds a new strim() and changes ipr.c to use it. [akpm@linux-foundation.org: coding-style fixes] Suggested-by: NAlan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Shrinks vmlinux without: $ size vmlinux text data bss dec hex filename 6975863 679652 1359668 9015183 898f8f vmlinux with: $ size vmlinux text data bss dec hex filename 6975639 679652 1359668 9014959 898eaf vmlinux Signed-off-by: NJoe Perches <joe@perches.com> Cc: Jeff Garzik <jgarzik@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 André Goddard Rosa 提交于
On the following sentence: while (*s && isspace(*s)) s++; If *s == 0, isspace() evaluates to ((_ctype[*s] & 0x20) != 0), which evaluates to ((0x08 & 0x20) != 0) which equals to 0 as well. If *s == 1, we depend on isspace() result anyway. In other words, "a char equals zero is never a space", so remove this check. Also, *s != 0 is most common case (non-null string). Fixed const return as noticed by Jan Engelhardt and James Bottomley. Fixed unnecessary extra cast on strstrip() as noticed by Jan Engelhardt. Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 André Goddard Rosa 提交于
While at it, use tabs to indent the comments. Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Bernhard Walle 提交于
The kernel offers with TIOCL_GETKMSGREDIRECT ioctl() the possibility to redirect the kernel messages to a specific console. However, since it's not possible to switch to the kernel message console after a panic(), it would be nice if the kernel would print the panic message on the current console. This patch series adds a new interface to access the global kmsg_redirect variable by a function to be able to use it in code where CONFIG_VT_CONSOLE is not set (kernel/panic.c). This patch: Instead of using and exporting a global value kmsg_redirect, introduce a function vt_kmsg_redirect() that both can set and return the console where messages are printed. Change all users of kmsg_redirect (the VT code itself and kernel/power.c) to the new interface. The main advantage is that vt_kmsg_redirect() can also be used when CONFIG_VT_CONSOLE is not set. Signed-off-by: NBernhard Walle <bernhard@bwalle.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
..and include them in the lxfb/gxfb drivers rather than asm/geode.h (where possible). Signed-off-by: NAndres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
Signed-off-by: NAndres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
The only thing that uses this is the reboot_fixups code. Signed-off-by: NAndres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
This is based on the old code on arch/x86/kernel/mfgpt_32.c, except it's not x86 specific, it's modular, and it makes use of a PCI BAR rather than a random MSR. Currently module unloading is not supported; it's uncertain whether or not it can be made work with the hardware. [akpm@linux-foundation.org: add X86 dependency] Signed-off-by: NAndres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
This creates a CS5535/CS5536 GPIO driver which uses a gpio_chip backend (allowing GPIO users to use the generic GPIO API if desired) while also allowing architecture-specific users directly (via the cs5535_gpio_* functions). Tested on an OLPC machine. Some Leemotes also use CS5536 (with a mips cpu), which is why this is in drivers/gpio rather than arch/x86. Currently, it conflicts with older geode GPIO support; once MFGPT support is reworked to also be more generic, the older geode code will be removed. Signed-off-by: NAndres Salomon <dilinger@collabora.co.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: David Brownell <david-b@pacbell.net> Reviewed-by: NAlessandro Zummo <a.zummo@towertech.it> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Phil Carmody 提交于
There are quite a few instances in the kernel of checks of pointers both against NULL and against the errno range, handling both cases identically. This additional helper function would simplify such code. [akpm@linux-foundation.org: build fix] Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hiroshi Shimamoto 提交于
journal_info in task_struct is used in journaling file system only. So introduce CONFIG_FS_JOURNAL_INFO and make it conditional. Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Add a printk_ratelimited statement expression macro that uses a per-call ratelimit_state so that multiple subsystems output messages are not suppressed by a global __ratelimit state. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: s/_rl/_ratelimited/g] Signed-off-by: NJoe Perches <joe@perches.com> Cc: Naohiro Ooiwa <nooiwa@miraclelinux.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Amerigo Wang 提交于
According to feature-removal-schedule.txt, it is the time to remove print_fn_descriptor_symbol(). And a quick grep shows that it no longer has any callers. Signed-off-by: NWANG Cong <amwang@redhat.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Amerigo Wang 提交于
rwsem_is_locked() tests ->activity without locks, so we should always keep ->activity consistent. However, the code in __rwsem_do_wake() breaks this rule, it updates ->activity after _all_ readers waken up, this may give some reader a wrong ->activity value, thus cause rwsem_is_locked() behaves wrong. Quote from Andrew: " - we have one or more processes sleeping in down_read(), waiting for access. - we wake one or more processes up without altering ->activity - they start to run and they do rwsem_is_locked(). This incorrectly returns "false", because the waker process is still crunching away in __rwsem_do_wake(). - the waker now alters ->activity, but it was too late. " So we need get a spinlock to protect this. And rwsem_is_locked() should not block, thus we use spin_trylock_irqsave(). [akpm@linux-foundation.org: simplify code] Reported-by: NBrian Behlendorf <behlendorf1@llnl.gov> Cc: Ben Woodard <bwoodard@llnl.gov> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NWANG Cong <amwang@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Don't initialize __print_once. Invert the test to reduce initialized data. defconfig before: $size vmlinux text data bss dec hex filename 6976022 679572 1359668 9015262 898fde vmlinux defconfig after: $size vmlinux text data bss dec hex filename 6976006 679508 1359700 9015214 898fae vmlinux Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
If CONFIG_DYNAMIC_DEBUG is enabled and a source file has: #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/kernel.h> dynamic_debug.h will duplicate KBUILD_MODNAME in the output string. Remove the use of KBUILD_MODNAME from the output format string generated by dynamic_debug.h If CONFIG_DYNAMIC_DEBUG is not enabled, no compile-time check is done to printk/dev_printk arguments. Add it. Signed-off-by: NJoe Perches <joe@perches.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Commit 70867453 ("printk_once(): use bool for boolean flag") changed printk_once() to use bool instead of int for its guard variable. Do the same change to WARN_ONCE() and WARN_ON_ONCE(), for the same reasons. This resulted in a reduction of 1462 bytes on a x86-64 defconfig: text data bss dec hex filename 8101271 1207116 992764 10301151 9d2edf vmlinux.before 8100553 1207148 991988 10299689 9d2929 vmlinux.after Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Cc: Roland Dreier <rolandd@cisco.com> Cc: Daniel Walker <dwalker@fifo99.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jie Zhang 提交于
The NOMMU code currently clears all anonymous mmapped memory. While this is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: NJie Zhang <jie.zhang@analog.com> Signed-off-by: NRobin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: NMike Frysinger <vapier@gentoo.org> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NPaul Mundt <lethal@linux-sh.org> Acked-by: NGreg Ungerer <gerg@snapgear.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
This patch enables extraction of the pfn of a hugepage from /proc/pid/pagemap in an architecture independent manner. Details ------- My test program (leak_pagemap) works as follows: - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,) - read()/write() something on it, - call page-types with option -p, - munmap() and unlink() the file on hugetlbfs Without my patches ------------------ $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 101 0 The output of page-types don't show any hugepage. With my patches --------------- $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge 0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap 0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 51300 200 The output of page-types shows 51200 pages contributing to hugepages, containing 100 head pages and 51100 tail pages as expected. [akpm@linux-foundation.org: build fix] Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Shijie 提交于
The check code for CONFIG_SWAP is redundant, because there is a non-CONFIG_SWAP version for PageSwapCache() which just returns 0. Signed-off-by: NHuang Shijie <shijie8@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
It has no references outside memory_hotplug.c. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The previous patch enables page migration of ksm pages, but that soon gets into trouble: not surprising, since we're using the ksm page lock to lock operations on its stable_node, but page migration switches the page whose lock is to be used for that. Another layer of locking would fix it, but do we need that yet? Do we actually need page migration of ksm pages? Yes, memory hotremove needs to offline sections of memory: and since we stopped allocating ksm pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE candidates for migration. But KSM is currently unconscious of NUMA issues, happily merging pages from different NUMA nodes: at present the rule must be, not to use MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of ksm pages does not make sense yet. So, to complete support for ksm swapping we need to make hotremove safe. ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages are freed before migration reaches them, stable_nodes may be left still pointing to struct pages which have been removed from the system: the stable_node needs to identify a page by pfn rather than page pointer, then it can safely prune them when MEM_OFFLINE. And make NUMA migration skip PageKsm pages where it skips PageReserved. But it's only when we reach unmap_and_move() that the page lock is taken and we can be sure that raised pagecount has prevented a PageAnon from being upgraded: so add offlining arg to migrate_pages(), to migrate ksm page when offlining (has sufficient locking) but reject it otherwise. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
A side-effect of making ksm pages swappable is that they have to be placed on the LRUs: which then exposes them to isolate_lru_page() and hence to page migration. Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c. Perhaps some consolidation with existing code is possible, but don't attempt that yet (try_to_unmap needs to handle nonlinears, but migration pte removal does not). rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like remove_anon_migration_ptes() which it replaces, avoids calling page_lock_anon_vma(), because that includes a page_mapped() test which fails when all migration ptes are in place. That was valid when NUMA page migration was introduced (holding mmap_sem provided the missing guarantee that anon_vma's slab had not already been destroyed), but I believe not valid in the memory hotremove case added since. For now do the same as before, and consider the best way to fix that unlikely race later on. When fixed, we can probably use rmap_walk() on hwpoisoned ksm pages too: for now, they remain among hwpoison's various exceptions (its PageKsm test comes before the page is locked, but its page_lock_anon_vma fails safely if an anon gets upgraded). Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
For full functionality, page_referenced_one() and try_to_unmap_one() need to know the vma: to pass vma down to arch-dependent flushes, or to observe VM_LOCKED or VM_EXEC. But KSM keeps no record of vma: nor can it, since vmas get split and merged without its knowledge. Instead, note page's anon_vma in its rmap_item when adding to stable tree: all the vmas which might map that page are listed by its anon_vma. page_referenced_ksm() and try_to_unmap_ksm() then traverse the anon_vma, first to find the probable vma, that which matches rmap_item's mm; but if that is not enough to locate all instances, traverse again to try the others. This catches those occasions when fork has duplicated a pte of a ksm page, but ksmd has not yet come around to assign it an rmap_item. But each rmap_item in the stable tree which refers to an anon_vma needs to take a reference to it. Andrea's anon_vma design cleverly avoided a reference count (an anon_vma was free when its list of vmas was empty), but KSM now needs to add that. Is a 32-bit count sufficient? I believe so - the anon_vma is only free when both count is 0 and list is empty. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Initial implementation for swapping out KSM's shared pages: add page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when faced with a PageKsm page. Most of what's needed can be got from the rmap_items listed from the stable_node of the ksm page, without discovering the actual vma: so in this patch just fake up a struct vma for page_referenced_one() or try_to_unmap_one(), then refine that in the next patch. Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been implicit there (being only set with VM_SHARED, already excluded), but let's make it explicit, to help justify the lack of nonlinear unmap. Rely on the page lock to protect against concurrent modifications to that page's node of the stable tree. The awkward part is not swapout but swapin: do_swap_page() and page_add_anon_rmap() now have to allow for new possibilities - perhaps a ksm page still in swapcache, perhaps a swapcache page associated with one location in one anon_vma now needed for another location or anon_vma. (And the vma might even be no longer VM_MERGEABLE when that happens.) ksm_might_need_to_copy() checks for that case, and supplies a duplicate page when necessary, simply leaving it to a subsequent pass of ksmd to rediscover the identity and merge them back into one ksm page. Disappointingly primitive: but the alternative would have to accumulate unswappable info about the swapped out ksm pages, limiting swappability. Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the particular case it was handling, so just use it instead. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Add a pointer to the ksm page into struct stable_node, holding a reference to the page while the node exists. Put a pointer to the stable_node into the ksm page's ->mapping. Then we don't need get_ksm_page() while traversing the stable tree: the page to compare against is sure to be present and correct, even if it's no longer visible through any of its existing rmap_items. And we can handle the forked ksm page case more efficiently: no need to memcmp our way through the tree to find its match. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Remove three degrees of obfuscation, left over from when we had CONFIG_UNEVICTABLE_LRU. MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is CONFIG_HAVE_MLOCK is CONFIG_MMU. rmap.o (and memory-failure.o) are only built when CONFIG_MMU, so don't need such conditions at all. Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from 169 defconfigs: leave those to evolve in due course. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set in page->mapping, with the higher bits a pointer to the anon_vma; and have defined PageKsm(page) as that with NULL anon_vma. But KSM swapping will need to store a pointer there: so in preparation for that, now define PAGE_MAPPING_FLAGS as the low two bits, including PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some other use for the bit emerges). Declare page_rmapping(page) to return the pointer part of page->mapping, and page_anon_vma(page) to return the anon_vma pointer when that's what it is. Use these in a few appropriate places: notably, unuse_vma() has been testing page->mapping, but is better to be testing page_anon_vma() (cases may be added in which flag bits are set without any pointer). Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel@csn.ul.ie: add stats to track how relevant the logic is] [mel@csn.ul.ie: make kswapd only check its own zones and rename the relevant counters] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-