1. 18 7月, 2007 1 次提交
    • M
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman 提交于
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  2. 17 7月, 2007 3 次提交
  3. 17 6月, 2007 1 次提交
  4. 17 5月, 2007 1 次提交
  5. 08 5月, 2007 3 次提交
  6. 13 2月, 2007 2 次提交
  7. 12 2月, 2007 3 次提交
  8. 27 1月, 2007 2 次提交
    • R
      [PATCH] i386 vDSO: use VM_ALWAYSDUMP · f47aef55
      Roland McGrath 提交于
      This patch fixes core dumps to include the vDSO vma, which is left out now.
      It removes the special-case core writing macros, which were not doing the
      right thing for the vDSO vma anyway.  Instead, it uses VM_ALWAYSDUMP in the
      vma; there is no need for the fixmap page to be installed.  It handles the
      CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from
      get_gate_vma after real vmas in the same way the /proc/PID/maps code does.
      
      This changes core dumps so they no longer include the non-PT_LOAD phdrs from
      the vDSO.  I made the change to add them in the first place, but in turned out
      that nothing ever wanted them there since the advent of NT_AUXV.  It's cleaner
      to leave them out, and just let the phdrs inside the vDSO image speak for
      themselves.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f47aef55
    • R
      [PATCH] Fix gate_vma.vm_flags · b6558c4a
      Roland McGrath 提交于
      This patch fixes the initialization of gate_vma.vm_flags and
      gate_vma.vm_page_prot to reflect reality.  This makes the "[vdso]" line in
      /proc/PID/maps correctly show r-xp instead of ---p, when gate_vma is used
      (CONFIG_COMPAT_VDSO on i386).
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6558c4a
  9. 09 1月, 2007 1 次提交
    • R
      [ARM] pass vma for flush_anon_page() · a6f36be3
      Russell King 提交于
      Since get_user_pages() may be used with processes other than the
      current process and calls flush_anon_page(), flush_anon_page() has to
      cope in some way with non-current processes.
      
      It may not be appropriate, or even desirable to flush a region of
      virtual memory cache in the current process when that is different to
      the process that we want the flush to occur for.
      
      Therefore, pass the vma into flush_anon_page() so that the architecture
      can work out whether the 'vmaddr' is for the current process or not.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      a6f36be3
  10. 23 12月, 2006 1 次提交
  11. 14 12月, 2006 1 次提交
  12. 11 12月, 2006 1 次提交
    • H
      [PATCH] read_zero_pagealigned() locking fix · 5fcf7bb7
      Hugh Dickins 提交于
      Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
      bugzilla 7645.  Right: read_zero_pagealigned uses down_read of mmap_sem,
      but another thread's racing read of /dev/zero, or a normal fault, can
      easily set that pte again, in between zap_page_range and zeromap_page_range
      getting there.  It's been wrong ever since 2.4.3.
      
      The simple fix is to use down_write instead, but that would serialize reads
      of /dev/zero more than at present: perhaps some app would be badly
      affected.  So instead let zeromap_page_range return the error instead of
      BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
      that case - there's no need to optimize for it.
      
      Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
      zeromap_page_range), though it really isn't interesting there.  And since
      mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
      than -ENOMEM.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Ramiro Voicu: <Ramiro.Voicu@cern.ch>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5fcf7bb7
  13. 08 12月, 2006 2 次提交
  14. 21 10月, 2006 1 次提交
    • D
      [PATCH] mm: D-cache aliasing issue in cow_user_page · c4ec7b0d
      Dmitriy Monakhov 提交于
      --=-=-=
      
       from mm/memory.c:
        1434  static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
        1435  {
        1436          /*
        1437           * If the source page was a PFN mapping, we don't have
        1438           * a "struct page" for it. We do a best-effort copy by
        1439           * just copying from the original user address. If that
        1440           * fails, we just zero-fill it. Live with it.
        1441           */
        1442          if (unlikely(!src)) {
        1443                  void *kaddr = kmap_atomic(dst, KM_USER0);
        1444                  void __user *uaddr = (void __user *)(va & PAGE_MASK);
        1445
        1446                  /*
        1447                   * This really shouldn't fail, because the page is there
        1448                   * in the page tables. But it might just be unreadable,
        1449                   * in which case we just give up and fill the result with
        1450                   * zeroes.
        1451                   */
        1452                  if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
        1453                          memset(kaddr, 0, PAGE_SIZE);
        1454                  kunmap_atomic(kaddr, KM_USER0);
        #### D-cache have to be flushed here.
        #### It seems it is just forgotten.
      
        1455                  return;
        1456
        1457          }
        1458          copy_user_highpage(dst, src, va);
        #### Ok here. flush_dcache_page() called from this func if arch need it
        1459  }
      
      Following is the patch  fix this issue:
      Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c4ec7b0d
  15. 06 10月, 2006 1 次提交
    • B
      [PATCH] page fault retry with NOPAGE_REFAULT · 7f7bbbe5
      Benjamin Herrenschmidt 提交于
      Add a way for a no_page() handler to request a retry of the faulting
      instruction.  It goes back to userland on page faults and just tries again
      in get_user_pages().  I added a cond_resched() in the loop in that later
      case.
      
      The problem I have with signal and spufs is an actual bug affecting apps and I
      don't see other ways of fixing it.
      
      In addition, we are having issues with infiniband and 64k pages (related to
      the way the hypervisor deals with some HV cards) that will require us to muck
      around with the MMU from within the IB driver's no_page() (it's a pSeries
      specific driver) and return to the caller the same way using NOPAGE_REFAULT.
      
      And to add to this, the graphics folks have been following a new approach of
      memory management that involves transparently swapping objects between video
      ram and main meory.  To do that, they need installing PTEs from a no_page()
      handler as well and that also requires returning with NOPAGE_REFAULT.
      
      (For the later, they are currently using io_remap_pfn_range to install one PTE
      from no_page() which is a bit racy, we need to add a check for the PTE having
      already been installed afer taking the lock, but that's ok, they are only at
      the proof-of-concept stage.  I'll send a patch adding a "clean" function to do
      that, we can use that from spufs too and get rid of the sparsemem hacks we do
      to create struct page for SPEs.  Basically, that provides a generic solution
      for being able to have no_page() map hardware devices, which is something that
      I think sound driver folks have been asking for some time too).
      
      All of these things depend on having the NOPAGE_REFAULT exit path from
      no_page() handlers.
      Signed-off-by: NBenjamin Herrenchmidt <benh@kernel.crashing.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7f7bbbe5
  16. 01 10月, 2006 3 次提交
  17. 30 9月, 2006 1 次提交
    • S
      [PATCH] mm: fix a race condition under SMC + COW · 4ce072f1
      Siddha, Suresh B 提交于
      Failing context is a multi threaded process context and the failing
      sequence is as follows.
      
      One thread T0 doing self modifying code on page X on processor P0 and
      another thread T1 doing COW (breaking the COW setup as part of just
      happened fork() in another thread T2) on the same page X on processor P1.
      T0 doing SMC can endup modifying the new page Y (allocated by the T1 doing
      COW on P1) but because of different I/D TLB's, P0 ITLB will not see the new
      mapping till the flush TLB IPI from P1 is received.  During this interval,
      if T0 executes the code created by SMC it can result in an app error (as
      ITLB still points to old page X and endup executing the content in page X
      rather than using the content in page Y).
      
      Fix this issue by first clearing the PTE and flushing it, before updating
      it with new entry.
      
      Hugh sayeth:
      
        I was a bit sceptical, in the habit of thinking that Self Modifying Code
        must look such issues itself: but I guess there's nothing it can do to avoid
        this one.
      
        Fair enough, what you're changing it to is pretty much what powerpc and
        s390 were already doing, and is a more robust way of proceeding, consistent
        with how ptes are set everywhere else.
      
        The ptep_clear_flush is a bit heavy-handed (it's anxious to return the pte
        that was atomically cleared), but we'd have to wander through lots of arches
        to get the right minimal behaviour.  It'd also be nice to eliminate
        ptep_establish completely, now only used to define other macros/inlines: it
        always seemed obfuscation to me, what you've got there now is clearer.
        Let's put those cleanups on a TODO list.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ce072f1
  18. 27 9月, 2006 2 次提交
    • D
      [PATCH] NOMMU: Check that access_process_vm() has a valid target · 0ec76a11
      David Howells 提交于
      Check that access_process_vm() is accessing a valid mapping in the target
      process.
      
      This limits ptrace() accesses and accesses through /proc/<pid>/maps to only
      those regions actually mapped by a program.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ec76a11
    • J
      [PATCH] do_no_pfn() · f4b81804
      Jes Sorensen 提交于
      Implement do_no_pfn() for handling mapping of memory without a struct page
      backing it.  This avoids creating fake page table entries for regions which
      are not backed by real memory.
      
      This feature is used by the MSPEC driver and other users, where it is
      highly undesirable to have a struct page sitting behind the page (for
      instance if the page is accessed in cached mode via the struct page in
      parallel to the the driver accessing it uncached, which can result in data
      corruption on some architectures, such as ia64).
      
      This version uses specific NOPFN_{SIGBUS,OOM} return values, rather than
      expect all negative pfn values would be an error.  It also bugs on cow
      mappings as this would not work with the VM.
      
      [akpm@osdl.org: micro-optimise]
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f4b81804
  19. 26 9月, 2006 4 次提交
    • R
      [PATCH] Add kerneldocs for some functions in mm/memory.c · bfa5bf6d
      Rolf Eike Beer 提交于
      These functions are already documented quite well with long comments.  Now
      add kerneldoc style header to make this turn up in everyones favorite doc
      format.
      Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bfa5bf6d
    • P
      [PATCH] mm: fixup do_wp_page() · ee6a6457
      Peter Zijlstra 提交于
      Wrt. the recent modifications in do_wp_page() Hugh Dickins pointed out:
      
        "I now realize it's right to the first order (normal case) and to the
         second order (ptrace poke), but not to the third order (ptrace poke
         anon page here to be COWed - perhaps can't occur without intervening
         mprotects)."
      
      This patch restores the old COW behaviour for anonymous pages.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee6a6457
    • P
      [PATCH] mm: balance dirty pages · edc79b2a
      Peter Zijlstra 提交于
      Now that we can detect writers of shared mappings, throttle them.  Avoids OOM
      by surprise.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      edc79b2a
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  20. 15 7月, 2006 2 次提交
    • S
      [PATCH] per-task-delay-accounting: sync block I/O and swapin delay collection · 0ff92245
      Shailabh Nagar 提交于
      Unlike earlier iterations of the delay accounting patches, now delays are only
      collected for the actual I/O waits rather than try and cover the delays seen
      in I/O submission paths.
      
      Account separately for block I/O delays incurred as a result of swapin page
      faults whose frequency can be affected by the task/process' rss limit.  Hence
      swapin delays can act as feedback for rss limit changes independent of I/O
      priority changes.
      Signed-off-by: NShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
      Cc: Erich Focht <efocht@ess.nec.de>
      Cc: Levent Serinol <lserinol@gmail.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ff92245
    • A
      [PATCH] ia64: race flushing icache in COW path · c38c8db7
      Anil Keshavamurthy 提交于
      There is a race condition that showed up in a threaded JIT environment.
      The situation is that a process with a JIT code page forks, so the page is
      marked read-only, then some threads are created in the child.  One of the
      threads attempts to add a new code block to the JIT page, so a
      copy-on-write fault is taken, and the kernel allocates a new page, copies
      the data, installs the new pte, and then calls lazy_mmu_prot_update() to
      flush caches to make sure that the icache and dcache are in sync.
      Unfortunately, the other thread runs right after the new pte is installed,
      but before the caches have been flushed.  It tries to execute some old JIT
      code that was already in this page, but it sees some garbage in the i-cache
      from the previous users of the new physical page.
      
      Fix: we must make the caches consistent before installing the pte.  This is
      an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
      architectures.
      Signed-off-by: NAnil Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c38c8db7
  21. 11 7月, 2006 1 次提交
  22. 04 7月, 2006 1 次提交
  23. 01 7月, 2006 2 次提交