- 13 11月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Brown <neilb@suse.de> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>
-
- 27 10月, 2010 1 次提交
-
-
由 Kay Sievers 提交于
System management wants to subscribe to changes in swap configuration. Make /proc/swaps pollable like /proc/mounts. [akpm@linux-foundation.org: document proc_poll_event] Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Acked-by: NGreg KH <greg@kroah.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 9月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 9月, 2010 5 次提交
-
-
由 Christoph Hellwig 提交于
The swap code already uses synchronous discards, no need to add I/O barriers. tj: superflous newlines removed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NHugh Dickins <hughd@google.com> Tested-by: NNigel Cunningham <nigel@tuxonice.net> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Hugh Dickins 提交于
Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs show a shift since I last tested in December: in part because of firmware updates, in part because of the necessary move from barriers to awaiting completion at the block layer. While discard at swapon still shows as slightly beneficial on both, discarding 1MB swap cluster when allocating is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV). Surrender: discard as presently implemented is more hindrance than help for swap; but might prove useful on other devices, or with improvements. So continue to do the discard at swapon, but make discard while swapping conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using only the lower 16 bits of int flags). We can add a --discard or -d to swapon(8), and a "discard" to swap in /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
The swap code already uses synchronous discards, no need to add I/O barriers. This fixes the worst of the terrible slowdown in swap allocation for hibernation, reported on 2.6.35 by Nigel Cunningham; but does not entirely eliminate that regression. [tj@kernel.org: superflous newlines removed] Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NNigel Cunningham <nigel@tuxonice.net> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Move the hibernation check from scan_swap_map() into try_to_free_swap(): to catch not only the common case when hibernation's allocation itself triggers swap reuse, but also the less likely case when concurrent page reclaim (shrink_page_list) might happen to try_to_free_swap from a page. Hibernation already clears __GFP_IO from the gfp_allowed_mask, to stop reclaim from going to swap: check that to prevent swap reuse too. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Please revert 2.6.36-rc commit d2997b10 "hibernation: freeze swap at hibernation". It complicated matters by adding a second swap allocation path, just for hibernation; without in any way fixing the issue that it was intended to address - page reclaim after fixing the hibernation image might free swap from a page already imaged as swapcache, letting its swap be reallocated to store a different page of the image: resulting in data corruption if the imaged page were freed as clean then swapped back in. Pages freed to si->swap_map were still in danger of being reallocated by the alternative allocation path. I guess it inadvertently fixed slow SSD swap allocation for hibernation, as reported by Nigel Cunningham: by missing out the discards that occur on the usual swap allocation path; but that was unintentional, and needs a separate fix. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2010 2 次提交
-
-
由 KAMEZAWA Hiroyuki 提交于
When taking a memory snapshot in hibernate_snapshot(), all (directly called) memory allocations use GFP_ATOMIC. Hence swap misusage during hibernation never occurs. But from a pessimistic point of view, there is no guarantee that no page allcation has __GFP_WAIT. It is better to have a global indication "we enter hibernation, don't use swap!". This patch tries to freeze new-swap-allocation during hibernation. (All user processes are frozenm so swapin is not a concern). This way, no updates will happen to swap_map[] between hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free() is called. We can be assured that swap corruption will not occur. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
Since 2.6.31, swap_map[]'s refcounting was changed to show that a used swap entry is just for swap-cache, can be reused. Then, while scanning free entry in swap_map[], a swap entry may be able to be reclaimed and reused. It was caused by commit c9e44410 ("mm: reuse unused swap entry if necessary"). But this caused deta corruption at resume. The scenario is - Assume a clean-swap cache, but mapped. - at hibernation_snapshot[], clean-swap-cache is saved as clean-swap-cache and swap_map[] is marked as SWAP_HAS_CACHE. - then, save_image() is called. And reuse SWAP_HAS_CACHE entry to save image, and break the contents. After resume: - the memory reclaim runs and finds clean-not-referenced-swap-cache and discards it because it's marked as clean. But here, the contents on disk and swap-cache is inconsistent. Hance memory is corrupted. This patch avoids the bug by not reclaiming swap-entry during hibernation. This is a quick fix for backporting. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: NOndreg Zary <linux@rainbow-software.org> Tested-by: NOndreg Zary <linux@rainbow-software.org> Tested-by: NAndrea Gelmini <andrea.gelmini@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 5月, 2010 2 次提交
-
-
由 Nitin Gupta 提交于
This callback is required when RAM based devices are used as swap disks. One such device is ramzswap which is used as compressed in-memory swap disk. For such devices, we need a callback as soon as a swap slot is no longer used to allow freeing memory allocated for this slot. Without this callback, stale data can quickly accumulate in memory defeating the whole purpose of such devices. Signed-off-by: NNitin Gupta <ngupta@vflare.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NNigel Cunningham <nigel@tuxonice.net> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Nitin Gupta 提交于
Added SWP_BLKDEV flag to distinguish block and regular file backed swap devices. We could also check if a swap is entire block device, rather than a file, by: S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode) but, I think, simply checking this flag is more convenient. Signed-off-by: NNitin Gupta <ngupta@vflare.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NNigel Cunningham <nigel@tuxonice.net> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 29 4月, 2010 1 次提交
-
-
由 Dmitry Monakhov 提交于
The patch just convert all blkdev_issue_xxx function to common set of flags. Wait/allocation semantics preserved. Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 13 3月, 2010 1 次提交
-
-
由 Daisuke Nishimura 提交于
This patch is another core part of this move-charge-at-task-migration feature. It enables moving charges of anonymous swaps. To move the charge of swap, we need to exchange swap_cgroup's record. In current implementation, swap_cgroup's record is protected by: - page lock: if the entry is on swap cache. - swap_lock: if the entry is not on swap cache. This works well in usual swap-in/out activity. But this behavior make the feature of moving swap charge check many conditions to exchange swap_cgroup's record safely. So I changed modification of swap_cgroup's recored(swap_cgroup_record()) to use xchg, and define a new function to cmpxchg swap_cgroup's record. This patch also enables moving charge of non pte_present but not uncharged swap caches, which can be exist on swap-out path, by getting the target pages via find_get_page() as do_mincore() does. [kosaki.motohiro@jp.fujitsu.com: fix ia64 build] [akpm@linux-foundation.org: fix typos] Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 3月, 2010 4 次提交
-
-
由 Hugh Dickins 提交于
swap_duplicate()'s loop appears to miss out on returning the error code from __swap_duplicate(), except when that's -ENOMEM. In fact this is intentional: prior to -ENOMEM for swap_count_continuation, swap_duplicate() was void (and the case only occurs when copy_one_pte() hits a corrupt pte). But that's surprising behaviour, which certainly deserves a comment. Signed-off-by: NHugh Dickins <hughd@google.com> Reported-by: NHuang Shijie <shijie8@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
There's an off-by-one disagreement between mkswap and swapon about the meaning of swap_header last_page: mkswap (in all versions I've looked at: util-linux-ng and BusyBox and old util-linux; probably as far back as 1999) consistently means the offset (in page units) of the last page of the swap area, whereas kernel sys_swapon (as far back as 2.2 and 2.3) strangely takes it to mean the size (in page units) of the swap area. This disagreement is the safe way round; but it's worrying people, and loses us one page of swap. The fix is not just to add one to nr_good_pages: we need to get maxpages (the size of the swap_map array) right before that; and though that is an unsigned long, be careful not to overflow the unsigned int p->max which later holds it (probably why header uses __u32 last_page instead of size). Why did we subtract one from the maximum swp_offset to calculate maxpages? Though it was probably me who made that change in 2.4.10, I don't get it: and now we should be adding one (without risk of overflow in this case). Fix the handling of swap_header badpages: it could have overrun the swap_map when very large swap area used on a more limited architecture. Remove pre-initializations of swap_header, nr_good_pages and maxpages: those date from when sys_swapon was supporting other versions of header. Reported-by: NNitin Gupta <ngupta@vflare.org> Reported-by: NJarkko Lavinen <jarkko.lavinen@nokia.com> Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
A frequent questions from users about memory management is what numbers of swap ents are user for processes. And this information will give some hints to oom-killer. Besides we can count the number of swapents per a process by scanning /proc/<pid>/smaps, this is very slow and not good for usual process information handler which works like 'ps' or 'top'. (ps or top is now enough slow..) This patch adds a counter of swapents to mm_counter and update is at each swap events. Information is exported via /proc/<pid>/status file as [kamezawa@bluextal memory]$ cat /proc/self/status Name: cat State: R (running) Tgid: 2910 Pid: 2910 PPid: 2823 TracerPid: 0 Uid: 500 500 500 500 Gid: 500 500 500 500 FDSize: 256 Groups: 500 VmPeak: 82696 kB VmSize: 82696 kB VmLck: 0 kB VmHWM: 432 kB VmRSS: 432 kB VmData: 172 kB VmStk: 84 kB VmExe: 48 kB VmLib: 1568 kB VmPTE: 40 kB VmSwap: 0 kB <=============== this. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
Presently, per-mm statistics counter is defined by macro in sched.h This patch modifies it to - defined in mm.h as inlinf functions - use array instead of macro's name creation. This patch is for reducing patch size in future patch to modify implementation of per-mm counter. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2009 11 次提交
-
-
由 Hugh Dickins 提交于
Initial implementation for swapping out KSM's shared pages: add page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when faced with a PageKsm page. Most of what's needed can be got from the rmap_items listed from the stable_node of the ksm page, without discovering the actual vma: so in this patch just fake up a struct vma for page_referenced_one() or try_to_unmap_one(), then refine that in the next patch. Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been implicit there (being only set with VM_SHARED, already excluded), but let's make it explicit, to help justify the lack of nonlinear unmap. Rely on the page lock to protect against concurrent modifications to that page's node of the stable tree. The awkward part is not swapout but swapin: do_swap_page() and page_add_anon_rmap() now have to allow for new possibilities - perhaps a ksm page still in swapcache, perhaps a swapcache page associated with one location in one anon_vma now needed for another location or anon_vma. (And the vma might even be no longer VM_MERGEABLE when that happens.) ksm_might_need_to_copy() checks for that case, and supplies a duplicate page when necessary, simply leaving it to a subsequent pass of ksmd to rediscover the identity and merge them back into one ksm page. Disappointingly primitive: but the alternative would have to accumulate unswappable info about the swapped out ksm pages, limiting swappability. Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the particular case it was handling, so just use it instead. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set in page->mapping, with the higher bits a pointer to the anon_vma; and have defined PageKsm(page) as that with NULL anon_vma. But KSM swapping will need to store a pointer there: so in preparation for that, now define PAGE_MAPPING_FLAGS as the low two bits, including PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some other use for the bit emerges). Declare page_rmapping(page) to return the pointer part of page->mapping, and page_anon_vma(page) to return the anon_vma pointer when that's what it is. Use these in a few appropriate places: notably, unuse_vma() has been testing page->mapping, but is better to be testing page_anon_vma() (cases may be added in which flag bits are set without any pointer). Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Seems that page_io.c doesn't really need to know that page_private(page) is the swp_entry 'val'. Rework map_swap_page() to do what its name says and map a page to a page offset in the swap space. The only other caller of map_swap_page() is internal to mm/swapfile.c and it does want to map a swap entry to the 'sector'. So rename map_swap_page() to map_swap_entry(), make it 'static' and and implement map_swap_page() as a wrapper around that. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
While we're fiddling with the swap_map values, let's assign a particular value to shmem/tmpfs swap pages: their swap counts are never incremented, and it helps swapoff's try_to_unuse() a little if it can immediately distinguish those pages from process pages. Since we've no use for SWAP_MAP_BAD | COUNT_CONTINUED, we might as well use that 0xbf value for SWAP_MAP_SHMEM. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Swap is duplicated (reference count incremented by one) whenever the same swap page is inserted into another mm (when forking finds a swap entry in place of a pte, or when reclaim unmaps a pte to insert the swap entry). swap_info_struct's vmalloc'ed swap_map is the array of these reference counts: but what happens when the unsigned short (or unsigned char since the preceding patch) is full? (and its high bit is kept for a cache flag) We then lose track of it, never freeing, leaving it in use until swapoff: at which point we _hope_ that a single pass will have found all instances, assume there are no more, and will lose user data if we're wrong. Swapping of KSM pages has not yet been enabled; but it is implemented, and makes it very easy for a user to overflow the maximum swap count: possible with ordinary process pages, but unlikely, even when pid_max has been raised from PID_MAX_DEFAULT. This patch implements swap count continuations: when the count overflows, a continuation page is allocated and linked to the original vmalloc'ed map page, and this used to hold the continuation counts for that entry and its neighbours. These continuation pages are seldom referenced: the common paths all work on the original swap_map, only referring to a continuation page when the low "digit" of a count is incremented or decremented through SWAP_MAP_MAX. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Halve the vmalloc'ed swap_map array from unsigned shorts to unsigned chars: it's still very unusual to reach a swap count of 126, and the next patch allows it to be extended indefinitely. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Though swap_count() is useful, I'm finding that swap_has_cache() and encode_swapmap() obscure what happens in the swap_map entry, just at those points where I need to understand it. Remove them, and pass more usable "usage" values to scan_swap_map(), swap_entry_free() and __swap_duplicate(), instead of the SWAP_MAP and SWAP_CACHE enum. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Move CONFIG_HIBERNATION's swapdev_block() into the main CONFIG_HIBERNATION block, remove extraneous whitespace and return, fix typo in a comment. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Make better use of the space by folding first swap_extent into its swap_info_struct, instead of just the list_head: swap partitions need only that one, and for others it's used as a circular list anyway. [jirislaby@gmail.com: fix crash on double swapon] Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The swap_info_struct is only 76 or 104 bytes, but it does seem wrong to reserve an array of about 30 of them in bss, when most people will want only one. Change swap_info[] to an array of pointers. That does need a "type" field in the structure: pack it as a char with next type and short prio (aha, char is unsigned by default on PowerPC). Use the (admittedly peculiar) name "type" throughout for this index. /proc/swaps does not take swap_lock: I wouldn't want it to, but do take care with barriers when adding a new item to the array (never removed). Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The swap_info_struct is mostly private to mm/swapfile.c, with only one other in-tree user: get_swap_bio(). Adjust its interface to map_swap_page(), so that we can then remove get_swap_info_struct(). But there is a popular user out-of-tree, TuxOnIce: so leave the declaration of swap_info_struct in linux/swap.h. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nigel Cunningham <ncunningham@crca.org.au> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 11月, 2009 1 次提交
-
-
由 Bo Liu 提交于
In try_to_unuse(), swcount is a local copy of *swap_map, including the SWAP_HAS_CACHE bit; but a wrong comparison against swap_count(*swap_map), which masks off the SWAP_HAS_CACHE bit, succeeded where it should fail. That had the effect of resetting the mm from which to start searching for the next swap page, to an irrelevant mm instead of to an mm in which this swap page had been found: which may increase search time by ~20%. But we're used to swapoff being slow, so never noticed the slowdown. Remove that one spurious use of swap_count(): Bo Liu thought it merely redundant, Hugh rewrote the description since it was measurably wrong. Signed-off-by: NBo Liu <bo-liu@hotmail.com> Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 10月, 2009 1 次提交
-
-
由 Suresh Jayaraman 提交于
While testing Swap over NFS patchset, I noticed an oops that was triggered during swapon. Investigating further, the NULL pointer deference is due to the SSD device check/optimization in the swapon code that assumes s_bdev could never be NULL. inode->i_sb->s_bdev could be NULL in a few cases. For e.g. one such case is loopback NFS mount, there could be others as well. Fix this by ensuring s_bdev is not NULL before we try to deference s_bdev. Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 22 9月, 2009 1 次提交
-
-
由 Hugh Dickins 提交于
Just as the swapoff system call allocates many pages of RAM to various processes, perhaps triggering OOM, so "echo 2 >/sys/kernel/mm/ksm/run" (unmerge) is liable to allocate many pages of RAM to various processes, perhaps triggering OOM; and each is normally run from a modest admin process (swapoff or shell), easily repeated until it succeeds. So treat unmerge_and_remove_all_rmap_items() in the same way that we treat try_to_unuse(): generalize PF_SWAPOFF to PF_OOM_ORIGIN, and bracket both with that, to ask the OOM killer to kill them first, to prevent them from spawning more and more OOM kills. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: NIzik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 9月, 2009 1 次提交
-
-
由 Andi Kleen 提交于
Memory migration uses special swap entry types to trigger special actions on page faults. Extend this mechanism to also support poisoned swap entries, to trigger poison handling on page faults. This allows follow-on patches to prevent processes from faulting in poisoned pages again. v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu) v3: Better overflow fix (Hidehiro Kawai) Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
- 14 9月, 2009 1 次提交
-
-
由 Christoph Hellwig 提交于
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 30 7月, 2009 1 次提交
-
-
由 Alan Jenkins 提交于
Create bdgrab(). This function copies an existing reference to a block_device. It is safe to call from any context. Hibernation code wishes to copy a reference to the active swap device. Right now it calls bdget() under a spinlock, but this is wrong because bdget() can sleep. It doesn't need a full bdget() because we already hold a reference to active swap devices (and the spinlock protects against swapoff). Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827Signed-off-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 19 6月, 2009 1 次提交
-
-
由 KAMEZAWA Hiroyuki 提交于
This patch fixes mis-accounting of swap usage in memcg. In the current implementation, memcg's swap account is uncharged only when swap is completely freed. But there are several cases where swap cannot be freed cleanly. For handling that, this patch changes that memcg uncharges swap account when swap has no references other than cache. By this, memcg's swap entry accounting can be fully synchronous with the application's behavior. This patch also changes memcg's hooks for swap-out. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: NBalbir Singh <balbir@in.ibm.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 6月, 2009 3 次提交
-
-
由 KAMEZAWA Hiroyuki 提交于
Presently we can know a swap entry is just used as SwapCache via swap_map, without looking up swap cache. Then, we have a chance to reuse swap-cache-only swap entries in get_swap_pages(). This patch tries to free swap-cache-only swap entries if swap is not enough. Note: We hit following path when swap_cluster code cannot find a free cluster. Then, vm_swap_full() is not only condition to allow the kernel to reclaim unused swap. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NBalbir Singh <balbir@in.ibm.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
This is a part of the patches for fixing memcg's swap accountinf leak. But, IMHO, not a bad patch even if no memcg. There are 2 kinds of references to swap. - reference from swap entry - reference from swap cache Then, - If there is swap cache && swap's refcnt is 1, there is only swap cache. (*) swapcount(entry) == 1 && find_get_page(swapper_space, entry) != NULL This counting logic have worked well for a long time. But considering that we cannot know there is a _real_ reference or not by swap_map[], current usage of counter is not very good. This patch adds a flag SWAP_HAS_CACHE and recored information that a swap entry has a cache or not. This will remove -1 magic used in swapfile.c and be a help to avoid unnecessary find_get_page(). Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
In a following patch, the usage of swap cache is recorded into swap_map. This patch is for necessary interface changes to do that. 2 interfaces: - swapcache_prepare() - swapcache_free() are added for allocating/freeing refcnt from swap-cache to existing swap entries. But implementation itself is not changed under this patch. At adding swapcache_free(), memcg's hook code is moved under swapcache_free(). This is better than using scattered hooks. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: NBalbir Singh <balbir@in.ibm.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 2月, 2009 1 次提交
-
-
由 Alan Jenkins 提交于
http://bugzilla.kernel.org/show_bug.cgi?id=12239 The image writing code dropped a reference to the current swap device. This doesn't show up if the hibernation succeeds - because it doesn't affect the image which gets resumed. But it means multiple _failed_ hibernations end up freeing the swap device while it is still use! swsusp_write() finds the block device for the swap file using swap_type_of(). It then uses blkdev_get() / blkdev_put() to open and close the block device. Unfortunately, blkdev_get() assumes ownership of the inode of the block_device passed to it. So blkdev_put() calls iput() on the inode. This is by design and other callers expect this behaviour. The fix is for swap_type_of() to take a reference on the inode using bdget(). Signed-off-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-