1. 17 10月, 2007 1 次提交
  2. 20 7月, 2007 2 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  3. 29 6月, 2007 1 次提交
    • H
      mm: kill validate_anon_vma to avoid mapcount BUG · 30acbaba
      Hugh Dickins 提交于
      validate_anon_vma gave a useful check on the integrity of the anon_vma list
      when Andrea was developing obj rmap; but it was not enabled in SLES9
      itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
      configurable CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that
      its BUG_ON(mapcount > 100000) can easily crash a CONFIG_DEBUG_VM=y system.
      
      That limit was just an arbitrary number to protect against an infinite
      loop.  We could raise it to something enormous (depending on sizeof struct
      vma and size of memory?); but I rather think validate_anon_vma has outlived
      its usefulness, and is better just removed - which gives a magnificent
      performance boost to anything like Petr's test program ;)
      
      Of course, a very long anon_vma list is bad news for preemption latency,
      and I believe there has been one recent report of such: let's not forget
      that, but validate_anon_vma only makes it worse not better.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Petr Vandrovec <petr@vmware.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30acbaba
  4. 17 5月, 2007 2 次提交
    • N
      mm: more rmap checking · c97a9e10
      Nick Piggin 提交于
      Re-introduce rmap verification patches that Hugh removed when he removed
      PG_map_lock. PG_map_lock actually isn't needed to synchronise access to
      anonymous pages, because PG_locked and PTL together already do.
      
      These checks were important in discovering and fixing a rare rmap corruption
      in SLES9.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c97a9e10
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  5. 09 5月, 2007 1 次提交
  6. 08 5月, 2007 1 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
  7. 27 4月, 2007 1 次提交
    • M
      [S390] split page_test_and_clear_dirty. · 6c210482
      Martin Schwidefsky 提交于
      The page_test_and_clear_dirty primitive really consists of two
      operations, page_test_dirty and the page_clear_dirty. The combination
      of the two is not an atomic operation, so it makes more sense to have
      two separate operations instead of one.
      In addition to the improved readability of the s390 version of
      SetPageUptodate, it now avoids the page_test_dirty operation which is
      an insert-storage-key-extended (iske) instruction which is an expensive
      operation.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6c210482
  8. 04 4月, 2007 1 次提交
    • M
      [S390] page_mkclean data corruption. · 6e1beb3c
      Martin Schwidefsky 提交于
      The git commit c2fda5fe which
      added the page_test_and_clear_dirty call to page_mkclean and the
      git commit 7658cc28 which fixes
      the "nasty and subtle race in shared mmap'ed page writeback"
      problem in clear_page_dirty_for_io cause data corruption on s390.
      
      The effect of the two changes is that for every call to
      clear_page_dirty_for_io a page_test_and_clear_dirty is done. If
      the per page dirty bit is set set_page_dirty is called. Strangly
      clear_page_dirty_for_io is called for not-uptodate pages, e.g.
      over this call-chain:
      
       [<000000000007c0f2>] clear_page_dirty_for_io+0x12a/0x130
       [<000000000007c494>] generic_writepages+0x258/0x3e0
       [<000000000007c692>] do_writepages+0x76/0x7c
       [<00000000000c7a26>] __writeback_single_inode+0xba/0x3e4
       [<00000000000c831a>] sync_sb_inodes+0x23e/0x398
       [<00000000000c8802>] writeback_inodes+0x12e/0x140
       [<000000000007b9ee>] wb_kupdate+0xd2/0x178
       [<000000000007cca2>] pdflush+0x162/0x23c
      
      The bad news now is that page_test_and_clear_dirty might claim
      that a not-uptodate page is dirty since SetPageUptodate which
      resets the per page dirty bit has not yet been called. The page
      writeback that follows clobbers the data on disk.
      
      The simplest solution to this problem is to move the call to
      page_test_and_clear_dirty under the "if (page_mapped(page))".
      If a file backed page is mapped it is uptodate.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6e1beb3c
  9. 02 3月, 2007 1 次提交
  10. 31 12月, 2006 1 次提交
  11. 23 12月, 2006 2 次提交
  12. 21 10月, 2006 1 次提交
  13. 12 10月, 2006 1 次提交
  14. 26 9月, 2006 1 次提交
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  15. 01 7月, 2006 2 次提交
    • C
      [PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED · f3dbd344
      Christoph Lameter 提交于
      The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
      calculation as the number of mapped pagecache pages.  However, that is not
      true.  NR_FILE_MAPPED includes the mapped anonymous pages.  This patch
      separates those and therefore allows an accurate tracking of the anonymous
      pages per zone.
      
      It then becomes possible to determine the number of unmapped pages per zone
      and we can avoid scanning for unmapped pages if there are none.
      
      Also it may now be possible to determine the mapped/unmapped ratio in
      get_dirty_limit.  Isnt the number of anonymous pages irrelevant in that
      calculation?
      
      Note that this will change the meaning of the number of mapped pages reported
      in /proc/vmstat /proc/meminfo and in the per node statistics.  This may affect
      user space tools that monitor these counters!  NR_FILE_MAPPED works like
      NR_FILE_DIRTY.  It is only valid for pagecache pages.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f3dbd344
    • C
      [PATCH] zoned vm counters: convert nr_mapped to per zone counter · 65ba55f5
      Christoph Lameter 提交于
      nr_mapped is important because it allows a determination of how many pages of
      a zone are not mapped, which would allow a more efficient means of determining
      when we need to reclaim memory in a zone.
      
      We take the nr_mapped field out of the page state structure and define a new
      per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
      from NR_MAPPED in the next patch).
      
      We replace the use of nr_mapped in various kernel locations.  This avoids the
      looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
      zone reclaim).
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      65ba55f5
  16. 26 6月, 2006 1 次提交
  17. 23 6月, 2006 5 次提交
    • C
      [PATCH] More page migration: use migration entries for file pages · 04e62a29
      Christoph Lameter 提交于
      This implements the use of migration entries to preserve ptes of file backed
      pages during migration.  Processes can therefore be migrated back and forth
      without loosing their connection to pagecache pages.
      
      Note that we implement the migration entries only for linear mappings.
      Nonlinear mappings still require the unmapping of the ptes for migration.
      
      And another writepage() ugliness shows up.  writepage() can drop the page
      lock.  Therefore we have to remove migration ptes before calling writepages()
      in order to avoid having migration entries point to unlocked pages.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04e62a29
    • C
      [PATCH] More page migration: do not inc/dec rss counters · 442c9137
      Christoph Lameter 提交于
      If we install a migration entry then the rss not really decreases since the
      page is just moved somewhere else.  We can save ourselves the work of
      decrementing and later incrementing which will just eventually cause cacheline
      bouncing.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      442c9137
    • C
      [PATCH] Swapless page migration: rip out swap based logic · d75a0fcd
      Christoph Lameter 提交于
      Rip the page migration logic out.
      
      Remove all code that has to do with swapping during page migration.
      
      This also guts the ability to migrate pages to swap.  No one used that so lets
      let it go for good.
      
      Page migration should be a bit broken after this patch.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d75a0fcd
    • C
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter 提交于
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0697212a
    • C
      [PATCH] page migration cleanup: rename "ignrefs" to "migration" · 7352349a
      Christoph Lameter 提交于
      migrate is a better name since it is only used by page migration.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7352349a
  18. 22 3月, 2006 2 次提交
  19. 10 3月, 2006 1 次提交
  20. 01 3月, 2006 1 次提交
  21. 02 2月, 2006 3 次提交
  22. 19 1月, 2006 1 次提交
  23. 10 1月, 2006 1 次提交
  24. 09 1月, 2006 1 次提交
  25. 07 1月, 2006 2 次提交
  26. 30 11月, 2005 1 次提交
  27. 29 11月, 2005 2 次提交
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
    • L
      mm: re-architect the VM_UNPAGED logic · 6aab341e
      Linus Torvalds 提交于
      This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
      explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
      VM area to contain an arbitrary range of page table entries that the VM
      never touches, and never considers to be normal pages.
      
      Any user of "remap_pfn_range()" automatically gets this new
      functionality, and doesn't even have to mark the pages reserved or
      indeed mark them any other way.  It just works.  As a side effect, doing
      mmap() on /dev/mem works for arbitrary ranges.
      
      Sparc update from David in the next commit.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6aab341e