1. 14 12月, 2006 1 次提交
  2. 11 12月, 2006 1 次提交
    • H
      [PATCH] read_zero_pagealigned() locking fix · 5fcf7bb7
      Hugh Dickins 提交于
      Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
      bugzilla 7645.  Right: read_zero_pagealigned uses down_read of mmap_sem,
      but another thread's racing read of /dev/zero, or a normal fault, can
      easily set that pte again, in between zap_page_range and zeromap_page_range
      getting there.  It's been wrong ever since 2.4.3.
      
      The simple fix is to use down_write instead, but that would serialize reads
      of /dev/zero more than at present: perhaps some app would be badly
      affected.  So instead let zeromap_page_range return the error instead of
      BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
      that case - there's no need to optimize for it.
      
      Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
      zeromap_page_range), though it really isn't interesting there.  And since
      mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
      than -ENOMEM.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Ramiro Voicu: <Ramiro.Voicu@cern.ch>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5fcf7bb7
  3. 08 12月, 2006 2 次提交
  4. 21 10月, 2006 1 次提交
    • D
      [PATCH] mm: D-cache aliasing issue in cow_user_page · c4ec7b0d
      Dmitriy Monakhov 提交于
      --=-=-=
      
       from mm/memory.c:
        1434  static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
        1435  {
        1436          /*
        1437           * If the source page was a PFN mapping, we don't have
        1438           * a "struct page" for it. We do a best-effort copy by
        1439           * just copying from the original user address. If that
        1440           * fails, we just zero-fill it. Live with it.
        1441           */
        1442          if (unlikely(!src)) {
        1443                  void *kaddr = kmap_atomic(dst, KM_USER0);
        1444                  void __user *uaddr = (void __user *)(va & PAGE_MASK);
        1445
        1446                  /*
        1447                   * This really shouldn't fail, because the page is there
        1448                   * in the page tables. But it might just be unreadable,
        1449                   * in which case we just give up and fill the result with
        1450                   * zeroes.
        1451                   */
        1452                  if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
        1453                          memset(kaddr, 0, PAGE_SIZE);
        1454                  kunmap_atomic(kaddr, KM_USER0);
        #### D-cache have to be flushed here.
        #### It seems it is just forgotten.
      
        1455                  return;
        1456
        1457          }
        1458          copy_user_highpage(dst, src, va);
        #### Ok here. flush_dcache_page() called from this func if arch need it
        1459  }
      
      Following is the patch  fix this issue:
      Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c4ec7b0d
  5. 06 10月, 2006 1 次提交
    • B
      [PATCH] page fault retry with NOPAGE_REFAULT · 7f7bbbe5
      Benjamin Herrenschmidt 提交于
      Add a way for a no_page() handler to request a retry of the faulting
      instruction.  It goes back to userland on page faults and just tries again
      in get_user_pages().  I added a cond_resched() in the loop in that later
      case.
      
      The problem I have with signal and spufs is an actual bug affecting apps and I
      don't see other ways of fixing it.
      
      In addition, we are having issues with infiniband and 64k pages (related to
      the way the hypervisor deals with some HV cards) that will require us to muck
      around with the MMU from within the IB driver's no_page() (it's a pSeries
      specific driver) and return to the caller the same way using NOPAGE_REFAULT.
      
      And to add to this, the graphics folks have been following a new approach of
      memory management that involves transparently swapping objects between video
      ram and main meory.  To do that, they need installing PTEs from a no_page()
      handler as well and that also requires returning with NOPAGE_REFAULT.
      
      (For the later, they are currently using io_remap_pfn_range to install one PTE
      from no_page() which is a bit racy, we need to add a check for the PTE having
      already been installed afer taking the lock, but that's ok, they are only at
      the proof-of-concept stage.  I'll send a patch adding a "clean" function to do
      that, we can use that from spufs too and get rid of the sparsemem hacks we do
      to create struct page for SPEs.  Basically, that provides a generic solution
      for being able to have no_page() map hardware devices, which is something that
      I think sound driver folks have been asking for some time too).
      
      All of these things depend on having the NOPAGE_REFAULT exit path from
      no_page() handlers.
      Signed-off-by: NBenjamin Herrenchmidt <benh@kernel.crashing.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7f7bbbe5
  6. 01 10月, 2006 3 次提交
  7. 30 9月, 2006 1 次提交
    • S
      [PATCH] mm: fix a race condition under SMC + COW · 4ce072f1
      Siddha, Suresh B 提交于
      Failing context is a multi threaded process context and the failing
      sequence is as follows.
      
      One thread T0 doing self modifying code on page X on processor P0 and
      another thread T1 doing COW (breaking the COW setup as part of just
      happened fork() in another thread T2) on the same page X on processor P1.
      T0 doing SMC can endup modifying the new page Y (allocated by the T1 doing
      COW on P1) but because of different I/D TLB's, P0 ITLB will not see the new
      mapping till the flush TLB IPI from P1 is received.  During this interval,
      if T0 executes the code created by SMC it can result in an app error (as
      ITLB still points to old page X and endup executing the content in page X
      rather than using the content in page Y).
      
      Fix this issue by first clearing the PTE and flushing it, before updating
      it with new entry.
      
      Hugh sayeth:
      
        I was a bit sceptical, in the habit of thinking that Self Modifying Code
        must look such issues itself: but I guess there's nothing it can do to avoid
        this one.
      
        Fair enough, what you're changing it to is pretty much what powerpc and
        s390 were already doing, and is a more robust way of proceeding, consistent
        with how ptes are set everywhere else.
      
        The ptep_clear_flush is a bit heavy-handed (it's anxious to return the pte
        that was atomically cleared), but we'd have to wander through lots of arches
        to get the right minimal behaviour.  It'd also be nice to eliminate
        ptep_establish completely, now only used to define other macros/inlines: it
        always seemed obfuscation to me, what you've got there now is clearer.
        Let's put those cleanups on a TODO list.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ce072f1
  8. 27 9月, 2006 2 次提交
    • D
      [PATCH] NOMMU: Check that access_process_vm() has a valid target · 0ec76a11
      David Howells 提交于
      Check that access_process_vm() is accessing a valid mapping in the target
      process.
      
      This limits ptrace() accesses and accesses through /proc/<pid>/maps to only
      those regions actually mapped by a program.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ec76a11
    • J
      [PATCH] do_no_pfn() · f4b81804
      Jes Sorensen 提交于
      Implement do_no_pfn() for handling mapping of memory without a struct page
      backing it.  This avoids creating fake page table entries for regions which
      are not backed by real memory.
      
      This feature is used by the MSPEC driver and other users, where it is
      highly undesirable to have a struct page sitting behind the page (for
      instance if the page is accessed in cached mode via the struct page in
      parallel to the the driver accessing it uncached, which can result in data
      corruption on some architectures, such as ia64).
      
      This version uses specific NOPFN_{SIGBUS,OOM} return values, rather than
      expect all negative pfn values would be an error.  It also bugs on cow
      mappings as this would not work with the VM.
      
      [akpm@osdl.org: micro-optimise]
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f4b81804
  9. 26 9月, 2006 4 次提交
    • R
      [PATCH] Add kerneldocs for some functions in mm/memory.c · bfa5bf6d
      Rolf Eike Beer 提交于
      These functions are already documented quite well with long comments.  Now
      add kerneldoc style header to make this turn up in everyones favorite doc
      format.
      Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bfa5bf6d
    • P
      [PATCH] mm: fixup do_wp_page() · ee6a6457
      Peter Zijlstra 提交于
      Wrt. the recent modifications in do_wp_page() Hugh Dickins pointed out:
      
        "I now realize it's right to the first order (normal case) and to the
         second order (ptrace poke), but not to the third order (ptrace poke
         anon page here to be COWed - perhaps can't occur without intervening
         mprotects)."
      
      This patch restores the old COW behaviour for anonymous pages.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee6a6457
    • P
      [PATCH] mm: balance dirty pages · edc79b2a
      Peter Zijlstra 提交于
      Now that we can detect writers of shared mappings, throttle them.  Avoids OOM
      by surprise.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      edc79b2a
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  10. 15 7月, 2006 2 次提交
    • S
      [PATCH] per-task-delay-accounting: sync block I/O and swapin delay collection · 0ff92245
      Shailabh Nagar 提交于
      Unlike earlier iterations of the delay accounting patches, now delays are only
      collected for the actual I/O waits rather than try and cover the delays seen
      in I/O submission paths.
      
      Account separately for block I/O delays incurred as a result of swapin page
      faults whose frequency can be affected by the task/process' rss limit.  Hence
      swapin delays can act as feedback for rss limit changes independent of I/O
      priority changes.
      Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
      Cc: Erich Focht <efocht@ess.nec.de>
      Cc: Levent Serinol <lserinol@gmail.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ff92245
    • A
      [PATCH] ia64: race flushing icache in COW path · c38c8db7
      Anil Keshavamurthy 提交于
      There is a race condition that showed up in a threaded JIT environment.
      The situation is that a process with a JIT code page forks, so the page is
      marked read-only, then some threads are created in the child.  One of the
      threads attempts to add a new code block to the JIT page, so a
      copy-on-write fault is taken, and the kernel allocates a new page, copies
      the data, installs the new pte, and then calls lazy_mmu_prot_update() to
      flush caches to make sure that the icache and dcache are in sync.
      Unfortunately, the other thread runs right after the new pte is installed,
      but before the caches have been flushed.  It tries to execute some old JIT
      code that was already in this page, but it sees some garbage in the i-cache
      from the previous users of the new physical page.
      
      Fix: we must make the caches consistent before installing the pte.  This is
      an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
      architectures.
      Signed-off-by: NAnil Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c38c8db7
  11. 11 7月, 2006 1 次提交
  12. 04 7月, 2006 1 次提交
  13. 01 7月, 2006 2 次提交
  14. 23 6月, 2006 3 次提交
    • D
      [PATCH] add page_mkwrite() vm_operations method · 9637a5ef
      David Howells 提交于
      Add a new VMA operation to notify a filesystem or other driver about the
      MMU generating a fault because userspace attempted to write to a page
      mapped through a read-only PTE.
      
      This facility permits the filesystem or driver to:
      
       (*) Implement storage allocation/reservation on attempted write, and so to
           deal with problems such as ENOSPC more gracefully (perhaps by generating
           SIGBUS).
      
       (*) Delay making the page writable until the contents have been written to a
           backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
           It permits the filesystem to have some guarantee about the state of the
           cache.
      
       (*) Account and limit number of dirty pages. This is one piece of the puzzle
           needed to make shared writable mapping work safely in FUSE.
      
      Needed by cachefs (Or is it cachefiles?  Or fscache? <head spins>).
      
      At least four other groups have stated an interest in it or a desire to use
      the functionality it provides: FUSE, OCFS2, NTFS and JFFS2.  Also, things like
      EXT3 really ought to use it to deal with the case of shared-writable mmap
      encountering ENOSPC before we permit the page to be dirtied.
      
      From: Peter Zijlstra <a.p.zijlstra@chello.nl>
      
        get_user_pages(.write=1, .force=1) can generate COW hits on read-only
        shared mappings, this patch traps those as mkpage_write candidates and fails
        to handle them the old way.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Joel Becker <Joel.Becker@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9637a5ef
    • C
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter 提交于
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0697212a
    • C
      [PATCH] Page Migration: Make do_swap_page redo the fault · 4da5eda0
      Christoph Lameter 提交于
      It is better to redo the complete fault if do_swap_page() finds that the
      page is not in PageSwapCache() because the page migration code may have
      replaced the swap pte already with a pte pointing to valid memory.
      
      do_swap_page() may interpret an invalid swap entry without this patch
      because we do not reload the pte if we are looping back.  The page
      migration code may already have reused the swap entry referenced by our
      local swp_entry.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4da5eda0
  15. 01 4月, 2006 1 次提交
    • O
      [PATCH] Don't pass boot parameters to argv_init[] · 9b41046c
      OGAWA Hirofumi 提交于
      The boot cmdline is parsed in parse_early_param() and
      parse_args(,unknown_bootoption).
      
      And __setup() is used in obsolete_checksetup().
      
      	start_kernel()
      		-> parse_args()
      			-> unknown_bootoption()
      				-> obsolete_checksetup()
      
      If __setup()'s callback (->setup_func()) returns 1 in
      obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
      handled.
      
      If ->setup_func() returns 0, obsolete_checksetup() tries other
      ->setup_func().  If all ->setup_func() that matched a parameter returns 0,
      a parameter is seted to argv_init[].
      
      Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
      If the app doesn't ignore those arguments, it will warning and exit.
      
      This patch fixes a wrong usage of it, however fixes obvious one only.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b41046c
  16. 27 3月, 2006 2 次提交
  17. 26 3月, 2006 1 次提交
  18. 22 3月, 2006 4 次提交
    • D
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() harder · 4866920b
      David Gibson 提交于
      Turns out the hugepage logic in free_pgtables() was doubly broken.  The
      loop coalescing multiple normal page VMAs into one call to free_pgd_range()
      had an off by one error, which could mean it would coalesce one hugepage
      VMA into the same bundle (checking 'vma' not 'next' in the loop).  I
      transferred this bug into the new is_vm_hugetlb_page() based version.
      Here's the fix.
      
      This one didn't bite on powerpc previously for the same reason the
      is_hugepage_only_range() problem didn't: powerpc's hugetlb_free_pgd_range()
      is identical to free_pgd_range().  It didn't bite on ia64 because the
      hugepage region is distant enough from any other region that the separated
      PMD_SIZE distance test would always prevent coalescing the two together.
      
      No libhugetlbfs testsuite regressions (ppc64, POWER5).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4866920b
    • D
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() · 9da61aef
      David Gibson 提交于
      free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
      of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
      to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
      range at the start of the vma.  is_hugepage_only_range() will return true
      if the given range has any intersection with a hugepage address region, and
      in this case the given region need not be hugepage aligned.  So, for
      example, this test can return true if called on, say, a 4k VMA immediately
      preceding a (nicely aligned) hugepage VMA.
      
      At present we get away with this because the powerpc version of
      hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
      only other arch with a non-trivial is_hugepage_only_range()) we get away
      with it for a different reason; the hugepage area is not contiguous with
      the rest of the user address space, and VMAs are not permitted in between,
      so the test can't return a false positive there.
      
      Nonetheless this should be fixed.  We do that in the patch below by
      replacing the is_hugepage_only_range() test with an explicit test of the
      VMA using is_vm_hugetlb_page().
      
      This in turn changes behaviour for platforms where is_hugepage_only_range()
      returns false always (everything except powerpc and ia64).  We address this
      by ensuring that hugetlb_free_pgd_range() is defined to be identical to
      free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
      it will prevent some otherwise possible coalescing of calls down to
      free_pgd_range().  Since this only happens for hugepage VMAs, removing this
      small optimization seems unlikely to cause any trouble.
      
      This patch causes no regressions on the libhugetlbfs testsuite - ppc64
      POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9da61aef
    • N
      [PATCH] mm: more CONFIG_DEBUG_VM · b7ab795b
      Nick Piggin 提交于
      Put a few more checks under CONFIG_DEBUG_VM
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b7ab795b
    • N
      [PATCH] mm: split highorder pages · 8dfcc9ba
      Nick Piggin 提交于
      Have an explicit mm call to split higher order pages into individual pages.
       Should help to avoid bugs and be more explicit about the code's intention.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8dfcc9ba
  19. 17 3月, 2006 1 次提交
    • H
      [PATCH] fix free swap cache latency · 6f5e6b9e
      Hugh Dickins 提交于
      Lee Revell reported 28ms latency when process with lots of swapped memory
      exits.
      
      2.6.15 introduced a latency regression when unmapping: in accounting the
      zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
      swap entry counted nothing at all.  We think of pages present as the slow
      case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
      can make a lot of work - and we could have been doing it many thousands of
      times without a latency break.
      
      Move the zap_work update up to account swap entries like pages present.
      This does account non-linear pte_file entries, and unmap_mapping_range
      skipping over swap entries, by the same amount even though they're quick:
      but neither of those cases deserves complicating the code (and they're
      treated no worse than they were in 2.6.14).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6f5e6b9e
  20. 18 2月, 2006 1 次提交
  21. 02 2月, 2006 1 次提交
  22. 10 1月, 2006 1 次提交
  23. 09 1月, 2006 1 次提交
  24. 07 1月, 2006 2 次提交