- 23 3月, 2011 27 次提交
-
-
由 Cesar Eduardo Barros 提交于
A conflict between 52c50567 ("mm: swap: unlock swapfile inode mutex before closing file on bad swapfiles") and 83ef99be ("sys_swapon: remove did_down variable") caused a double unlock of the inode mutex (once in bad_swap: before the filp_close, once at the end just before returning). The patch which added the extra unlock cleared did_down to avoid unlocking twice, but the other patch removed the did_down variable. To fix, set inode to NULL after the first unlock, since it will be used after that point only for the final unlock. While checking this patch, I found a path which could unlock without locking, in case the same inode was added as a swapfile twice. To fix, move the setting of the inode variable further down, to just before claim_swapfile, which will lock the inode before doing anything else. Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Eric B Munson <emunson@mgebm.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
scan_swap_map() is a large function (224 lines), with several loops and a complex control flow involving several gotos. Given all that, it is a bit silly that it is marked as inline. The compiler agrees with me: on a x86-64 compile, it did not inline the function. Remove the "inline" and let the compiler decide instead. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
The block in sys_swapon which does the final adjustments to the swap_info_struct and to swap_list is the same as the block which re-inserts it again at sys_swapoff on failure of try_to_unuse(). Move this code to a separate function, and use it both in sys_swapon and sys_swapoff. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
The block in sys_swapon which does the final adjustments to the swap_info_struct and to swap_list is the same as the block which re-inserts it again at sys_swapoff on failure of try_to_unuse(), except for the order of the operations within the lock. Since the order should not matter, arbitrarily change sys_swapoff to match sys_swapon, in preparation to making both share the same code. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
The block in sys_swapon which does the final adjustments to the swap_info_struct and to swap_list is the same as the block which re-inserts it again at sys_swapoff on failure of try_to_unuse(). To be able to make both share the same code, move the printk() call in the middle of it to just after it. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
It still exists within setup_swap_map_and_extents(), but after it nr_good_pages == p->pages. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Move the code which parses the bad block list and the extents to a separate function. Only code movement, no functional changes. This change uses the fact that, after the success path, nr_good_pages == p->pages. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
The call to swap_cgroup_swapon is in the middle of loading the swap map and extents. As it only does memory allocation and does not depend on the swapfile layout (map/extents), it can be called earlier (or later). Move it to just after the allocation of swap_map, since it is conceptually similar (allocates a map). Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Move the code which parses and checks the swapfile header (except for the bad block list) to a separate function. Only code movement, no functional changes. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
There is no reason I can see to read inode->i_size long before it is needed. Move its read to just before it is needed, to reduce the variable lifetime. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NJesper Juhl <jj@chaosbits.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Move the code which claims the bdev (S_ISBLK) or locks the inode (S_ISREG) to a separate function. Only code movement, no functional changes. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
sys_swapon currently has two error labels, bad_swap and bad_swap_2. bad_swap does the same as bad_swap_2 plus destroy_swap_extents() and swap_cgroup_swapoff(); both are noops in the places where bad_swap_2 is jumped to. With a single extra test for inode (matching the one in the S_ISREG case below), all the error paths in the function can go to bad_swap. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
The only way error is 0 in the cleanup blocks is when the function is returning successfully. In this case, the cleanup blocks were setting S_SWAPFILE in the S_ISREG case. But this is not a cleanup. Move the setting of S_SWAPFILE to just before the "goto out;" to make this more clear. At this point, we do not need to test for inode because it will never be NULL. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
The bdev variable is always equivalent to (S_ISBLK(inode->i_mode) ? p->bdev : NULL), as long as it being set is moved to a bit earlier. Use this fact to remove the bdev variable. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Move the setting of the error variable nearer the goto in a few places. Avoids calling PTR_ERR() if not IS_ERR() in two places, and makes the error condition more explicit in two other places. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NJesper Juhl <jj@chaosbits.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Since mutex_lock(&inode->i_mutex) is called just after setting inode, did_down is always equivalent to (inode && S_ISREG(inode->i_mode)). Use this fact to remove the did_down variable. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Now there is nothing which jumps to the cleanup blocks before the name variable is set. There is no need to set it initially to NULL anymore. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Since there is no cleanup to do, there is no reason to jump to a label. Return directly instead. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
At this point in sys_swapon, there is nothing to free. Return directly instead of jumping to the cleanup block at the end of the function. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Move the swap_info allocation to its own function. Only code movement, no functional changes. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Within sys_swapon, after the swap_info entry has been allocated, we always have type == p->type and swap_info[type] == p. Use this fact to reduce the dependency on the "type" local variable within the function, as a preparation to move the allocation of the swap_info entry to a separate function. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
Changelogs belong in the git history instead of in the source code. Also, "The swapon system call" is redundant with "SYSCALL_DEFINE2(swapon, ...)". Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NJesper Juhl <jj@chaosbits.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> [ Gaah. That's a _historical_ comment. But the patch-series depends on removal ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
This patch series refactors the sys_swapon function. sys_swapon is currently a very large function, with 313 lines (more than 12 25-line screens), which can make it a bit hard to read. This patch series reduces this size by half, by extracting large chunks of related code to new helper functions. One of these chunks of code was nearly identical to the part of sys_swapoff which is used in case of a failure return from try_to_unuse(), so this patch series also makes both share the same code. As a side effect of all this refactoring, the compiled code gets a bit smaller (from v1 of this patch series): text data bss dec hex filename 14012 944 276 15232 3b80 mm/swapfile.o.before 13941 944 276 15161 3b39 mm/swapfile.o.after This patch: Use vzalloc() instead of vmalloc/memset. Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Tested-by: NEric B Munson <emunson@mgebm.net> Acked-by: NEric B Munson <emunson@mgebm.net> Reviewed-by: NPekka Enberg <penberg@kernel.org> Reviewed-by: NJesper Juhl <jj@chaosbits.net> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
If an administrator tries to swapon a file backed by NFS, the inode mutex is taken (as it is for any swapfile) but later identified to be a bad swapfile due to the lack of bmap and tries to cleanup. During cleanup, an attempt is made to close the file but with inode->i_mutex still held. Closing an NFS file syncs it which tries to acquire the inode mutex leading to deadlock. If lockdep is enabled the following appears on the console; ============================================= [ INFO: possible recursive locking detected ] 2.6.38-rc8-autobuild #1 --------------------------------------------- swapon/2192 is trying to acquire lock: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c but task is already holding lock: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7 other info that might help us debug this: 1 lock held by swapon/2192: #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7 stack backtrace: Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1 Call Trace: __lock_acquire+0x2eb/0x1623 find_get_pages_tag+0x14a/0x174 pagevec_lookup_tag+0x25/0x2e vfs_fsync_range+0x47/0x7c lock_acquire+0xd3/0x100 vfs_fsync_range+0x47/0x7c nfs_flush_one+0x0/0xdf [nfs] mutex_lock_nested+0x40/0x2b1 vfs_fsync_range+0x47/0x7c vfs_fsync_range+0x47/0x7c vfs_fsync+0x1c/0x1e nfs_file_flush+0x64/0x69 [nfs] filp_close+0x43/0x72 sys_swapon+0xa39/0xae7 sysret_check+0x2e/0x69 system_call_fastpath+0x16/0x1b This patch releases the mutex if its held before calling filep_close() so swapon fails as expected without deadlock when the swapfile is backed by NFS. If accepted for 2.6.39, it should also be considered a -stable candidate for 2.6.38 and 2.6.37. Signed-off-by: NMel Gorman <mgorman@suse.de> Acked-by: NHugh Dickins <hughd@google.com> Cc: <stable@kernel.org> [2.6.37+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 2月, 2011 1 次提交
-
-
由 Miklos Szeredi 提交于
Grab a reference to bdev before calling blkdev_get(), which expects the refcount to be already incremented and either returns success or decrements the refcount and returns an error. The bug was introduced by e525fd89 (block: make blkdev_get/put() handle exclusive access), which didn't take into account this behavior of blkdev_get(). Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2011 1 次提交
-
-
由 Andrea Arcangeli 提交于
Paging logic that splits the page before it is unmapped and added to swap to ensure backwards compatibility with the legacy swap code. Eventually swap should natively pageout the hugepages to increase performance and decrease seeking and fragmentation of swap space. swapoff can just skip over huge pmd as they cannot be part of swap yet. In add_to_swap be careful to split the page only if we got a valid swap entry so we don't split hugepages with a full swap. In theory we could split pages before isolating them during the lru scan, but for khugepaged to be safe, I'm relying on either mmap_sem write mode, or PG_lock taken, so split_huge_page has to run either with mmap_sem read/write mode or PG_lock taken. Calling it from isolate_lru_page would make locking more complicated, in addition to that split_huge_page would deadlock if called by __isolate_lru_page because it has to take the lru lock to add the tail pages. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 11月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NNeil Brown <neilb@suse.de> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>
-
- 27 10月, 2010 1 次提交
-
-
由 Kay Sievers 提交于
System management wants to subscribe to changes in swap configuration. Make /proc/swaps pollable like /proc/mounts. [akpm@linux-foundation.org: document proc_poll_event] Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Acked-by: NGreg KH <greg@kroah.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 9月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 9月, 2010 5 次提交
-
-
由 Christoph Hellwig 提交于
The swap code already uses synchronous discards, no need to add I/O barriers. tj: superflous newlines removed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NHugh Dickins <hughd@google.com> Tested-by: NNigel Cunningham <nigel@tuxonice.net> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Hugh Dickins 提交于
Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs show a shift since I last tested in December: in part because of firmware updates, in part because of the necessary move from barriers to awaiting completion at the block layer. While discard at swapon still shows as slightly beneficial on both, discarding 1MB swap cluster when allocating is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV). Surrender: discard as presently implemented is more hindrance than help for swap; but might prove useful on other devices, or with improvements. So continue to do the discard at swapon, but make discard while swapping conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using only the lower 16 bits of int flags). We can add a --discard or -d to swapon(8), and a "discard" to swap in /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
The swap code already uses synchronous discards, no need to add I/O barriers. This fixes the worst of the terrible slowdown in swap allocation for hibernation, reported on 2.6.35 by Nigel Cunningham; but does not entirely eliminate that regression. [tj@kernel.org: superflous newlines removed] Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NNigel Cunningham <nigel@tuxonice.net> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Move the hibernation check from scan_swap_map() into try_to_free_swap(): to catch not only the common case when hibernation's allocation itself triggers swap reuse, but also the less likely case when concurrent page reclaim (shrink_page_list) might happen to try_to_free_swap from a page. Hibernation already clears __GFP_IO from the gfp_allowed_mask, to stop reclaim from going to swap: check that to prevent swap reuse too. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Please revert 2.6.36-rc commit d2997b10 "hibernation: freeze swap at hibernation". It complicated matters by adding a second swap allocation path, just for hibernation; without in any way fixing the issue that it was intended to address - page reclaim after fixing the hibernation image might free swap from a page already imaged as swapcache, letting its swap be reallocated to store a different page of the image: resulting in data corruption if the imaged page were freed as clean then swapped back in. Pages freed to si->swap_map were still in danger of being reallocated by the alternative allocation path. I guess it inadvertently fixed slow SSD swap allocation for hibernation, as reported by Nigel Cunningham: by missing out the discards that occur on the usual swap allocation path; but that was unintentional, and needs a separate fix. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2010 2 次提交
-
-
由 KAMEZAWA Hiroyuki 提交于
When taking a memory snapshot in hibernate_snapshot(), all (directly called) memory allocations use GFP_ATOMIC. Hence swap misusage during hibernation never occurs. But from a pessimistic point of view, there is no guarantee that no page allcation has __GFP_WAIT. It is better to have a global indication "we enter hibernation, don't use swap!". This patch tries to freeze new-swap-allocation during hibernation. (All user processes are frozenm so swapin is not a concern). This way, no updates will happen to swap_map[] between hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free() is called. We can be assured that swap corruption will not occur. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
Since 2.6.31, swap_map[]'s refcounting was changed to show that a used swap entry is just for swap-cache, can be reused. Then, while scanning free entry in swap_map[], a swap entry may be able to be reclaimed and reused. It was caused by commit c9e44410 ("mm: reuse unused swap entry if necessary"). But this caused deta corruption at resume. The scenario is - Assume a clean-swap cache, but mapped. - at hibernation_snapshot[], clean-swap-cache is saved as clean-swap-cache and swap_map[] is marked as SWAP_HAS_CACHE. - then, save_image() is called. And reuse SWAP_HAS_CACHE entry to save image, and break the contents. After resume: - the memory reclaim runs and finds clean-not-referenced-swap-cache and discards it because it's marked as clean. But here, the contents on disk and swap-cache is inconsistent. Hance memory is corrupted. This patch avoids the bug by not reclaiming swap-entry during hibernation. This is a quick fix for backporting. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: NOndreg Zary <linux@rainbow-software.org> Tested-by: NOndreg Zary <linux@rainbow-software.org> Tested-by: NAndrea Gelmini <andrea.gelmini@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 5月, 2010 1 次提交
-
-
由 Nitin Gupta 提交于
This callback is required when RAM based devices are used as swap disks. One such device is ramzswap which is used as compressed in-memory swap disk. For such devices, we need a callback as soon as a swap slot is no longer used to allow freeing memory allocated for this slot. Without this callback, stale data can quickly accumulate in memory defeating the whole purpose of such devices. Signed-off-by: NNitin Gupta <ngupta@vflare.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NNigel Cunningham <nigel@tuxonice.net> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-