1. 06 2月, 2008 1 次提交
    • H
      swapin needs gfp_mask for loop on tmpfs · 02098fea
      Hugh Dickins 提交于
      Building in a filesystem on a loop device on a tmpfs file can hang when
      swapping, the loop thread caught in that infamous throttle_vm_writeout.
      
      In theory this is a long standing problem, which I've either never seen in
      practice, or long ago suppressed the recollection, after discounting my load
      and my tmpfs size as unrealistically high.  But now, with the new aops, it has
      become easy to hang on one machine.
      
      Loop used to grab_cache_page before the old prepare_write to tmpfs, which
      seems to have been enough to free up some memory for any swapin needed; but
      the new write_begin lets tmpfs find or allocate the page (much nicer, since
      grab_cache_page missed tmpfs pages in swapcache).
      
      When allocating a fresh page, tmpfs respects loop's mapping_gfp_mask, which
      has __GFP_IO|__GFP_FS stripped off, and throttle_vm_writeout is designed to
      break out when __GFP_IO or GFP_FS is unset; but when tmfps swaps in,
      read_swap_cache_async allocates with GFP_HIGHUSER_MOVABLE regardless of the
      mapping_gfp_mask - hence the hang.
      
      So, pass gfp_mask down the line from shmem_getpage to shmem_swapin to
      swapin_readahead to read_swap_cache_async to add_to_swap_cache.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02098fea
  2. 30 7月, 2007 1 次提交
  3. 17 7月, 2007 1 次提交
  4. 08 5月, 2007 1 次提交
  5. 06 1月, 2007 1 次提交
  6. 09 12月, 2006 1 次提交
  7. 08 12月, 2006 5 次提交
  8. 30 9月, 2006 1 次提交
  9. 28 8月, 2006 1 次提交
  10. 01 7月, 2006 1 次提交
  11. 23 6月, 2006 5 次提交
    • P
      [PATCH] read_mapping_page for address space · 090d2b18
      Pekka Enberg 提交于
      Add read_mapping_page() which is used for callers that pass
      mapping->a_ops->readpage as the filler for read_cache_page.  This removes
      some duplication from filesystem code.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      090d2b18
    • H
      [PATCH] swapoff: use atomic_inc_not_zero() on mm_users · 70af7c5c
      Hugh Dickins 提交于
      Now that we have atomic_inc_not_zero, it's more elegant for try_to_unuse to
      use that on mm_users: doesn't actually matter at present, but safer to be
      sure that once mm_users has gone to 0, nothing raises it for an instant.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      70af7c5c
    • C
      [PATCH] Swapless page migration: rip out swap based logic · d75a0fcd
      Christoph Lameter 提交于
      Rip the page migration logic out.
      
      Remove all code that has to do with swapping during page migration.
      
      This also guts the ability to migrate pages to swap.  No one used that so lets
      let it go for good.
      
      Page migration should be a bit broken after this patch.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d75a0fcd
    • C
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter 提交于
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0697212a
    • C
      [PATCH] migration: remove unnecessary PageSwapCache checks · 3c5a87f4
      Christoph Lameter 提交于
      Remove two unnecessary PageSwapCache checks.  The page refcount is raised
      and therefore page migration cannot occur in both functions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3c5a87f4
  12. 01 4月, 2006 1 次提交
  13. 23 3月, 2006 2 次提交
    • R
      [PATCH] swsusp: userland interface · 6e1819d6
      Rafael J. Wysocki 提交于
      This patch introduces a user space interface for swsusp.
      
      The interface is based on a special character device, called the snapshot
      device, that allows user space processes to perform suspend and resume-related
      operations with the help of some ioctls and the read()/write() functions.
       Additionally it allows these processes to allocate free swap pages from a
      selected swap partition, called the resume partition, so that they know which
      sectors of the resume partition are available to them.
      
      The interface uses the same low-level system memory snapshot-handling
      functions that are used by the built-it swap-writing/reading code of swsusp.
      
      The interface documentation is included in the patch.
      
      The patch assumes that the major and minor numbers of the snapshot device will
      be 10 (ie.  misc device) and 231, the registration of which has already been
      requested.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e1819d6
    • R
      [PATCH] swsusp: low level interface · f577eb30
      Rafael J. Wysocki 提交于
      Introduce the low level interface that can be used for handling the
      snapshot of the system memory by the in-kernel swap-writing/reading code of
      swsusp and the userland interface code (to be introduced shortly).
      
      Also change the way in which swsusp records the allocated swap pages and,
      consequently, simplifies the in-kernel swap-writing/reading code (this is
      necessary for the userland interface too).  To this end, it introduces two
      helper functions in mm/swapfile.c, so that the swsusp code does not refer
      directly to the swap internals.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f577eb30
  14. 22 3月, 2006 1 次提交
  15. 02 2月, 2006 2 次提交
  16. 19 1月, 2006 1 次提交
  17. 12 1月, 2006 1 次提交
  18. 11 1月, 2006 1 次提交
  19. 10 1月, 2006 1 次提交
  20. 09 1月, 2006 1 次提交
  21. 07 1月, 2006 1 次提交
  22. 07 11月, 2005 1 次提交
  23. 30 10月, 2005 4 次提交
    • H
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins 提交于
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
    • H
      [PATCH] mm: pte_offset_map_lock loops · 705e87c0
      Hugh Dickins 提交于
      Convert those common loops using page_table_lock on the outside and
      pte_offset_map within to use just pte_offset_map_lock within instead.
      
      These all hold mmap_sem (some exclusively, some not), so at no level can a
      page table be whipped away from beneath them.  But whereas pte_alloc loops
      tested with the "atomic" pmd_present, these loops are testing with pmd_none,
      which on i386 PAE tests both lower and upper halves.
      
      That's now unsafe, so add a cast into pmd_none to test only the vital lower
      half: we lose a little sensitivity to a corrupt middle directory, but not
      enough to worry about.  It appears that i386 and UML were the only
      architectures vulnerable in this way, and pgd and pud no problem.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      705e87c0
    • H
      [PATCH] mm: rss = file_rss + anon_rss · 4294621f
      Hugh Dickins 提交于
      I was lazy when we added anon_rss, and chose to change as few places as
      possible.  So currently each anonymous page has to be counted twice, in rss
      and in anon_rss.  Which won't be so good if those are atomic counts in some
      configurations.
      
      Change that around: keep file_rss and anon_rss separately, and add them
      together (with get_mm_rss macro) when the total is needed - reading two
      atomics is much cheaper than updating two atomics.  And update anon_rss
      upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4294621f
    • H
      [PATCH] mm: anon is already wrprotected · 72866f6f
      Hugh Dickins 提交于
      do_anonymous_page's pte_wrprotect causes some confusion: in such a case,
      vm_page_prot must already be forcing COW, so must omit write permission, and
      so the pte_wrprotect is redundant.  Replace it by a comment to that effect,
      and reword the comment on unuse_pte which also caused confusion.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      72866f6f
  24. 23 9月, 2005 1 次提交
    • R
      [PATCH] Fix bd_claim() error code. · f7b3a435
      Rob Landley 提交于
      Problem: In some circumstances, bd_claim() is returning the wrong error
      code.
      
      If we try to swapon an unused block device that isn't swap formatted, we
      get -EINVAL.  But if that same block device is already mounted, we instead
      get -EBUSY, even though it still isn't a valid swap device.
      
      This issue came up on the busybox list trying to get the error message
      from "swapon -a" right.  If a swap device is already enabled, we get -EBUSY,
      and we shouldn't report this as an error.  But we can't distinguish the two
      -EBUSY conditions, which are very different errors.
      
      In the code, bd_claim() returns either 0 or -EBUSY, but in this case busy
      means "somebody other than sys_swapon has already claimed this", and
      _that_ means this block device can't be a valid swap device.  So return
      -EINVAL there.
      Signed-off-by: NRob Landley <rob@landley.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b3a435
  25. 11 9月, 2005 1 次提交
  26. 05 9月, 2005 2 次提交
    • H
      [PATCH] swap: swap_lock replace list+device · 5d337b91
      Hugh Dickins 提交于
      The idea of a swap_device_lock per device, and a swap_list_lock over them all,
      is appealing; but in practice almost every holder of swap_device_lock must
      already hold swap_list_lock, which defeats the purpose of the split.
      
      The only exceptions have been swap_duplicate, valid_swaphandles and an
      untrodden path in try_to_unuse (plus a few places added in this series).
      valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
      demand attention.  However, with the hold time in get_swap_pages so much
      reduced, I've not yet found a load and set of swap device priorities to show
      even swap_duplicate benefitting from the split.  Certainly the split is mere
      overhead in the common case of a single swap device.
      
      So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
      (generally we seem to prefer an _ in the name, and not hide in a macro).
      
      If someone can show a regression in swap_duplicate, then probably we should
      add a hashlock for the swap_map entries alone (shorts being anatomic), so as
      to help the case of the single swap device too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d337b91
    • H
      [PATCH] swap: scan_swap_map latency breaks · 048c27fd
      Hugh Dickins 提交于
      The get_swap_page/scan_swap_map latency can be so bad that even those without
      preemption configured deserve relief: periodically cond_resched.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      048c27fd