1. 22 3月, 2006 4 次提交
    • D
      [PATCH] hugepage: Strict page reservation for hugepage inodes · b45b5bd6
      David Gibson 提交于
      These days, hugepages are demand-allocated at first fault time.  There's a
      somewhat dubious (and racy) heuristic when making a new mmap() to check if
      there are enough available hugepages to fully satisfy that mapping.
      
      A particularly obvious case where the heuristic breaks down is where a
      process maps its hugepages not as a single chunk, but as a bunch of
      individually mmap()ed (or shmat()ed) blocks without touching and
      instantiating the pages in between allocations.  In this case the size of
      each block is compared against the total number of available hugepages.
      It's thus easy for the process to become overcommitted, because each block
      mapping will succeed, although the total number of hugepages required by
      all blocks exceeds the number available.  In particular, this defeats such
      a program which will detect a mapping failure and adjust its hugepage usage
      downward accordingly.
      
      The patch below addresses this problem, by strictly reserving a number of
      physical hugepages for hugepage inodes which have been mapped, but not
      instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
      mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
      trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
      only if the sysadmin explicitly reduces the hugepage pool between mapping
      and instantiation)
      
      This patch appears to address the problem at hand - it allows DB2 to start
      correctly, for instance, which previously suffered the failure described
      above.
      
      This patch causes no regressions on the libhugetblfs testsuite, and makes a
      test (designed to catch this problem) pass which previously failed (ppc64,
      POWER5).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b45b5bd6
    • N
      [PATCH] mm: nommu use compound pages · 84097518
      Nick Piggin 提交于
      Now that compound page handling is properly fixed in the VM, move nommu
      over to using compound pages rather than rolling their own refcounting.
      
      nommu vm page refcounting is broken anyway, but there is no need to have
      divergent code in the core VM now, nor when it gets fixed.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      
      (Needs testing, please).
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      84097518
    • C
      [PATCH] slab: Remove SLAB_NO_REAP option · ac2b898c
      Christoph Lameter 提交于
      SLAB_NO_REAP is documented as an option that will cause this slab not to be
      reaped under memory pressure.  However, that is not what happens.  The only
      thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
      slab elements that were allocated in batch in cache_reap().  Cache_reap()
      is run every few seconds independently of memory pressure.
      
      Could we remove the whole thing?  Its only used by three slabs anyways and
      I cannot find a reason for having this option.
      
      There is an additional problem with SLAB_NO_REAP.  If set then the recovery
      of objects from alien caches is switched off.  Objects not freed on the
      same node where they were initially allocated will only be reused if a
      certain amount of objects accumulates from one alien node (not very likely)
      or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
      check for SLAB_NO_REAP)
      
      Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac2b898c
    • L
      [PATCH] v9fs: assign dentry ops to negative dentries · 5e7a99ac
      Latchesar Ionkov 提交于
      If a file is not found in v9fs_vfs_lookup, the function creates negative
      dentry, but doesn't assign any dentry ops.  This leaves the negative entry
      in the cache (there is no d_delete to mark it for removal).  If the file is
      created outside of the mounted v9fs filesystem, the file shows up in the
      directory with weird permissions.
      
      This patch assigns the default v9fs dentry ops to the negative dentry.
      Signed-off-by: NLatchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5e7a99ac
  2. 21 3月, 2006 7 次提交
  3. 17 3月, 2006 2 次提交
  4. 16 3月, 2006 2 次提交
    • A
      [PATCH] Fix ext2 readdir f_pos re-validation logic · 2d7f2ea9
      Al Viro 提交于
      This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
      in the ext2 filesystem "readdir()" function.  It also cleans up the code
      to avoid the unnecessary goto mess.
      
      The bugs were related to re-valiating the f_pos value after somebody had
      either done an "lseek()" on the directory to an invalid offset, or when
      the offset had become invalid due to a file being unlinked in the
      directory.  The code would not only set the f_version too eagerly, it
      would also not update f_pos appropriately for when the offset fixup took
      place.
      
      When that happened, we'd occasionally subsequently fail the readdir()
      even when we shouldn't (no real harm done, but an ugly printk, and
      obviously you would end up not necessarily seeing all entries).
      
      Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem
      and had a test-case for it, and also fixed up a thinko in the first
      version of this patch.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: NMasoud Sharbiani <masouds@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2d7f2ea9
    • A
      [PATCH] fs/namespace.c:dup_namespace(): fix a use after free · f13b8358
      Adrian Bunk 提交于
      The Coverity checker spotted the following bug in dup_namespace():
      
      <--  snip  -->
      
              if (!new_ns->root) {
                      up_write(&namespace_sem);
                      kfree(new_ns);
                      goto out;
              }
      ...
      out:
              return new_ns;
      
      <--  snip  -->
      
      Callers expect a non-NULL result to not be freed.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f13b8358
  5. 15 3月, 2006 3 次提交
  6. 14 3月, 2006 3 次提交
    • T
      [PATCH] NLM: Ensure we do not Oops in the case of an unlock · 30f4e20a
      Trond Myklebust 提交于
      In theory, NLM specs assure us that the server will only reply LCK_GRANTED or
      LCK_DENIED_GRACE_PERIOD to our NLM_UNLOCK request.
      
      In practice, we should not assume this to be the case, and the code will
      currently Oops if we do.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      30f4e20a
    • T
      [PATCH] NFSv4: fix mount segfault on errors returned that are < -1000 · c12e87f4
      Trond Myklebust 提交于
      It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
      mapping them to kernel errors.  Problem spotted by Neil Horman
      <nhorman@tuxdriver.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c12e87f4
    • T
      [PATCH] NFS: Fix a potential panic in O_DIRECT · 143f412e
      Trond Myklebust 提交于
      Based on an original patch by Mike O'Connor and Greg Banks of SGI.
      
      Mike states:
      
      A normal user can panic an NFS client and cause a local DoS with
      'judicious'(?) use of O_DIRECT.  Any O_DIRECT write to an NFS file where the
      user buffer starts with a valid mapped page and contains an unmapped page,
      will crash in this way.  I haven't followed the code, but O_DIRECT reads with
      similar user buffers will probably also crash albeit in different ways.
      
      Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
      correctly handles get_user_pages() returning an error, which happens if the
      first page covered by the user buffer's address range is unmapped.  However,
      if the first page is mapped but some subsequent page isn't, get_user_pages()
      will return a positive number which is less than the number of pages requested
      (this behaviour is sort of analagous to a short write() call and appears to be
      intentional).  nfs_get_user_pages() doesn't detect this and hands off the
      array of pages (whose last few elements are random rubbish from the newly
      allocated array memory) to it's caller, whence they go to
      nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
      and calculates its own idea of how many pages are in the array from the user
      buffer length.  Needless to say, when it comes to transmit those uninitialised
      page* pointers, we see a crash in the network stack.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      143f412e
  7. 12 3月, 2006 2 次提交
  8. 10 3月, 2006 2 次提交
  9. 09 3月, 2006 8 次提交
  10. 08 3月, 2006 1 次提交
  11. 07 3月, 2006 4 次提交
  12. 05 3月, 2006 1 次提交
    • S
      [CIFS] Always match oplock break (cache notification) to the right tcp · e77e6f3b
      Steve French 提交于
      session when multiply mounted.
      
      Fixes slow response when cifs client is mounted to shares on multiple
      servers and oplock break occurs (usually due to attempt to multiply open a
      file).  When treeids on mutiple mounted shares match and we find the wrong
      match first, we searched for the wrong cached files to send oplock break
      response for which usually meant that no matching file was found and thus
      the server would have to timeout the notification.  Oplock break timeout is
      about 20 seconds on some servers so this could cause significantly slower
      performance on file open calls in a few cases (in particular when multiple
      shares are mounted from multiple servers, tree ids match, and we have a
      cached file which is later opened multiple times).  This was the most
      important of the bugs that was found and fixed at Connectathon
      (interoperability testing event) this week.
      
      Acked-by:  Shaggy (shaggy@austin.ibm.com)
      Signed-off-by: Steve French (sfrench@us.ibm.com)
      e77e6f3b
  13. 03 3月, 2006 1 次提交