1. 29 10月, 2006 2 次提交
    • M
      [PATCH] vmscan: Fix temp_priority race · 3bb1a852
      Martin Bligh 提交于
      The temp_priority field in zone is racy, as we can walk through a reclaim
      path, and just before we copy it into prev_priority, it can be overwritten
      (say with DEF_PRIORITY) by another reclaimer.
      
      The same bug is contained in both try_to_free_pages and balance_pgdat, but
      it is fixed slightly differently.  In balance_pgdat, we keep a separate
      priority record per zone in a local array.  In try_to_free_pages there is
      no need to do this, as the priority level is the same for all zones that we
      reclaim from.
      
      Impact of this bug is that temp_priority is copied into prev_priority, and
      setting this artificially high causes reclaimers to set distress
      artificially low.  They then fail to reclaim mapped pages, when they are,
      in fact, under severe memory pressure (their priority may be as low as 0).
      This causes the OOM killer to fire incorrectly.
      
      From: Andrew Morton <akpm@osdl.org>
      
      __zone_reclaim() isn't modifying zone->prev_priority.  But zone->prev_priority
      is used in the decision whether or not to bring mapped pages onto the inactive
      list.  Hence there's a risk here that __zone_reclaim() will fail because
      zone->prev_priority ir large (ie: low urgency) and lots of mapped pages end up
      stuck on the active list.
      
      Fix that up by decreasing (ie making more urgent) zone->prev_priority as
      __zone_reclaim() scans the zone's pages.
      
      This bug perhaps explains why ZONE_RECLAIM_PRIORITY was created.  It should be
      possible to remove that now, and to just start out at DEF_PRIORITY?
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3bb1a852
    • N
      [PATCH] mm: clean up pagecache allocation · 2ae88149
      Nick Piggin 提交于
      - Consolidate page_cache_alloc
      
      - Fix splice: only the pagecache pages and filesystem data need to use
        mapping_gfp_mask.
      
      - Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2ae88149
  2. 22 10月, 2006 2 次提交
  3. 21 10月, 2006 6 次提交
    • N
      [PATCH] mm: more commenting on lock ordering · 82591e6e
      Nick Piggin 提交于
      Clarify lockorder comments now that sys_msync dropps mmap_sem before
      calling do_fsync.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      82591e6e
    • D
      [PATCH] mm: D-cache aliasing issue in cow_user_page · c4ec7b0d
      Dmitriy Monakhov 提交于
      --=-=-=
      
       from mm/memory.c:
        1434  static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
        1435  {
        1436          /*
        1437           * If the source page was a PFN mapping, we don't have
        1438           * a "struct page" for it. We do a best-effort copy by
        1439           * just copying from the original user address. If that
        1440           * fails, we just zero-fill it. Live with it.
        1441           */
        1442          if (unlikely(!src)) {
        1443                  void *kaddr = kmap_atomic(dst, KM_USER0);
        1444                  void __user *uaddr = (void __user *)(va & PAGE_MASK);
        1445
        1446                  /*
        1447                   * This really shouldn't fail, because the page is there
        1448                   * in the page tables. But it might just be unreadable,
        1449                   * in which case we just give up and fill the result with
        1450                   * zeroes.
        1451                   */
        1452                  if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
        1453                          memset(kaddr, 0, PAGE_SIZE);
        1454                  kunmap_atomic(kaddr, KM_USER0);
        #### D-cache have to be flushed here.
        #### It seems it is just forgotten.
      
        1455                  return;
        1456
        1457          }
        1458          copy_user_highpage(dst, src, va);
        #### Ok here. flush_dcache_page() called from this func if arch need it
        1459  }
      
      Following is the patch  fix this issue:
      Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c4ec7b0d
    • A
      [PATCH] highest_possible_node_id() linkage fix · 6220ec78
      Andrew Morton 提交于
      Qooting Adrian:
      
      - net/sunrpc/svc.c uses highest_possible_node_id()
      
      - include/linux/nodemask.h says highest_possible_node_id() is
        out-of-line #if MAX_NUMNODES > 1
      
      - the out-of-line highest_possible_node_id() is in lib/cpumask.c
      
      - lib/Makefile: lib-$(CONFIG_SMP) += cpumask.o
        CONFIG_ARCH_DISCONTIGMEM_ENABLE=y, CONFIG_SMP=n, CONFIG_SUNRPC=y
      
      -> highest_possible_node_id() is used in net/sunrpc/svc.c
         CONFIG_NODES_SHIFT defined and > 0
      
      -> include/linux/numa.h: MAX_NUMNODES > 1
      
      -> compile error
      
      The bug is not present on architectures where ARCH_DISCONTIGMEM_ENABLE
      depends on NUMA (but m32r isn't the only affected architecture).
      
      So move the function into page_alloc.c
      
      Cc: Adrian Bunk <bunk@stusta.de>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6220ec78
    • A
      [PATCH] OOM killer meets userspace headers · 8ac773b4
      Alexey Dobriyan 提交于
      Despite mm.h is not being exported header, it does contain one thing
      which is part of userspace ABI -- value disabling OOM killer for given
      process. So,
      a) create and export include/linux/oom.h
      b) move OOM_DISABLE define there.
      c) turn bounding values of /proc/$PID/oom_adj into defines and export
         them too.
      
      Note: mass __KERNEL__ removal will be done later.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ac773b4
    • A
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton 提交于
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fcfab16
    • J
      [PATCH] direct-io: sync and invalidate file region when falling back to buffered write · fb5527e6
      Jeff Moyer 提交于
      When direct-io falls back to buffered write, it will just leave the dirty data
      floating about in pagecache, pending regular writeback.
      
      But normal direct-io semantics are that IO is synchronous, and that it leaves
      no pagecache behind.
      
      So change the fallback-to-buffered-write code to sync the file region and to
      then strip away the pagecache, just as a regular direct-io write would do.
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb5527e6
  4. 20 10月, 2006 1 次提交
  5. 17 10月, 2006 3 次提交
    • A
      [PATCH] vmalloc(): don't pass __GFP_ZERO to slab · 286e1ea3
      Andrew Morton 提交于
      A recent change to the vmalloc() code accidentally resulted in us passing
      __GFP_ZERO into the slab allocator.  But we only wanted __GFP_ZERO for the
      actual pages whcih are being vmalloc()ed, and passing __GFP_ZERO into slab is
      not a rational thing to ask for.
      
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      286e1ea3
    • D
      [PATCH] knfsd: add nfs-export support to tmpfs · 91828a40
      David M. Grimes 提交于
      We need to encode a decode the 'file' part of a handle.  We simply use the
      inode number and generation number to construct the filehandle.
      
      The generation number is the time when the file was created.  As inode numbers
      cycle through the full 32 bits before being reused, there is no real chance of
      the same inum being allocated to different files in the same second so this is
      suitably unique.  Using time-of-day rather than e.g.  jiffies makes it less
      likely that the same filehandle can be created after a reboot.
      
      In order to be able to decode a filehandle we need to be able to lookup by
      inum, which means that the inode needs to be added to the inode hash table
      (tmpfs doesn't currently hash inodes as there is never a need to lookup by
      inum).  To avoid overhead when not exporting, we only hash an inode when it is
      first exported.  This requires a lock to ensure it isn't hashed twice.
      
      This code is separate from the patch posted in June06 from Atal Shargorodsky
      which provided the same functionality, but does borrow slightly from it.
      
      Locking comment: Most filesystems that hash their inodes do so at the point
      where the 'struct inode' is initialised, and that has suitable locking
      (I_NEW).  Here in shmem, we are hashing the inode later, the first time we
      need an NFS file handle for it.  We no longer have I_NEW to ensure only one
      thread tries to add it to the hash table.
      
      Cc: Atal Shargorodsky <atal@codefidence.com>
      Cc: Gilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NDavid M. Grimes <dgrimes@navisite.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      91828a40
    • A
      [PATCH] invalidate: remove_mapping() fix · a649fd92
      Andrew Morton 提交于
      If remove_mapping() failed to remove the page from its mapping, don't go and
      mark it not uptodate!  Makes kernel go dead.
      
      (Actually, I don't think the ClearPageUptodate is needed there at all).
      
      Says Nick Piggin:
      
         "Right, it isn't needed because at this point the page is guaranteed
          by remove_mapping to have no references (except us) and cannot pick
          up any new ones because it is removed from pagecache.
      
          We can delete it."
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a649fd92
  6. 16 10月, 2006 1 次提交
  7. 12 10月, 2006 9 次提交
  8. 08 10月, 2006 1 次提交
  9. 06 10月, 2006 2 次提交
    • B
      [PATCH] page fault retry with NOPAGE_REFAULT · 7f7bbbe5
      Benjamin Herrenschmidt 提交于
      Add a way for a no_page() handler to request a retry of the faulting
      instruction.  It goes back to userland on page faults and just tries again
      in get_user_pages().  I added a cond_resched() in the loop in that later
      case.
      
      The problem I have with signal and spufs is an actual bug affecting apps and I
      don't see other ways of fixing it.
      
      In addition, we are having issues with infiniband and 64k pages (related to
      the way the hypervisor deals with some HV cards) that will require us to muck
      around with the MMU from within the IB driver's no_page() (it's a pSeries
      specific driver) and return to the caller the same way using NOPAGE_REFAULT.
      
      And to add to this, the graphics folks have been following a new approach of
      memory management that involves transparently swapping objects between video
      ram and main meory.  To do that, they need installing PTEs from a no_page()
      handler as well and that also requires returning with NOPAGE_REFAULT.
      
      (For the later, they are currently using io_remap_pfn_range to install one PTE
      from no_page() which is a bit racy, we need to add a check for the PTE having
      already been installed afer taking the lock, but that's ok, they are only at
      the proof-of-concept stage.  I'll send a patch adding a "clean" function to do
      that, we can use that from spufs too and get rid of the sparsemem hacks we do
      to create struct page for SPEs.  Basically, that provides a generic solution
      for being able to have no_page() map hardware devices, which is something that
      I think sound driver folks have been asking for some time too).
      
      All of these things depend on having the NOPAGE_REFAULT exit path from
      no_page() handlers.
      Signed-off-by: NBenjamin Herrenchmidt <benh@kernel.crashing.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7f7bbbe5
    • P
      [PATCH] slab: reduce numa text size · 1ca4cb24
      Pekka Enberg 提交于
      Reduce the NUMA text size of mm/slab.o a little on x86 by using a local
      variable to store the result of numa_node_id().
      
          text    data     bss     dec     hex filename
         16858    2584      16   19458    4c02 mm/slab.o (before)
         16804    2584      16   19404    4bcc mm/slab.o (after)
      
      [akpm@osdl.org: use better names]
      [pbadari@us.ibm.com: fix that]
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1ca4cb24
  10. 04 10月, 2006 10 次提交
  11. 01 10月, 2006 3 次提交