1. 17 10月, 2007 5 次提交
    • N
      mm: buffered write cleanup · eb2be189
      Nick Piggin 提交于
      Quite a bit of code is used in maintaining these "cached pages" that are
      probably pretty unlikely to get used. It would require a narrow race where
      the page is inserted concurrently while this process is allocating a page
      in order to create the spare page. Then a multi-page write into an uncached
      part of the file, to make use of it.
      
      Next, the buffered write path (and others) uses its own LRU pagevec when it
      should be just using the per-CPU LRU pagevec (which will cut down on both data
      and code size cacheline footprint). Also, these private LRU pagevecs are
      emptied after just a very short time, in contrast with the per-CPU pagevecs
      that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
      to add the pages to pagecache for a bulk write (in 4K chunks).
      
      [this gets rid of some cond_resched() calls in readahead.c and mpage.c due
       to clashes in -mm. What put them there, and why? ]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb2be189
    • N
      fs: fix nobh error handling · a4b0672d
      Nick Piggin 提交于
      nobh mode error handling is not just pretty slack, it's wrong.
      
      One cannot zero out the whole page to ensure new blocks are zeroed, because
      it just brings the whole page "uptodate" with zeroes even if that may not
      be the correct uptodate data.  Also, other parts of the page may already
      contain dirty data which would get lost by zeroing it out.  Thirdly, the
      writeback of zeroes to the new blocks will also erase existing blocks.  All
      these conditions are pagecache and/or filesystem corruption.
      
      The problem comes about because we didn't keep track of which buffers
      actually are new or old.  However it is not enough just to keep only this
      state, because at the point we start dirtying parts of the page (new
      blocks, with zeroes), the handling of IO errors becomes impossible without
      buffers because the page may only be partially uptodate, in which case the
      page flags allone cannot capture the state of the parts of the page.
      
      So allocate all buffers for the page upfront, but leave them unattached so
      that they don't pick up any other references and can be freed when we're
      done.  If the error path is hit, then zero the new buffers as the regular
      buffer path does, then attach the buffers to the page so that it can
      actually be written out correctly and be subject to the normal IO error
      handling paths.
      
      As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page
      systems.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4b0672d
    • D
      mm: add end_buffer_read helper function · 68671f35
      Dmitry Monakhov 提交于
      Move duplicated code from end_buffer_read_XXX methods to separate helper
      function.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68671f35
    • N
      remove ZERO_PAGE · 557ed1fa
      Nick Piggin 提交于
      The commit b5810039 contains the note
      
        A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
        (and thus mapcounted and count towards shared rss).  These writes to
        the struct page could cause excessive cacheline bouncing on big
        systems.  There are a number of ways this could be addressed if it is
        an issue.
      
      And indeed this cacheline bouncing has shown up on large SGI systems.
      There was a situation where an Altix system was essentially livelocked
      tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
      This situation can be avoided in userspace, but it does highlight the
      potential scalability problem with refcounting ZERO_PAGE, and corner
      cases where it can really hurt (we don't want the system to livelock!).
      
      There are several broad ways to fix this problem:
      1. add back some special casing to avoid refcounting ZERO_PAGE
      2. per-node or per-cpu ZERO_PAGES
      3. remove the ZERO_PAGE completely
      
      I will argue for 3. The others should also fix the problem, but they
      result in more complex code than does 3, with little or no real benefit
      that I can see.
      
      Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
      false optimisation: if an application is performance critical, it would
      not be doing many read faults of new memory, or at least it could be
      expected to write to that memory soon afterwards. If cache or memory use
      is critical, it should not be working with a significant number of
      ZERO_PAGEs anyway (a more compact representation of zeroes should be
      used).
      
      As a sanity check -- mesuring on my desktop system, there are never many
      mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
      increase much without it.
      
      When running a make -j4 kernel compile on my dual core system, there are
      about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
      ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
      is torn down without being COWed). So removing ZERO_PAGE will save 1,000
      page faults per second when running kbuild, while keeping it only saves
      less than 1 page clearing operation per second. 1 page clear is cheaper
      than a thousand faults, presumably, so there isn't an obvious loss.
      
      Neither the logical argument nor these basic tests give a guarantee of no
      regressions. However, this is a reasonable opportunity to try to remove
      the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
      we can reintroduce it and just avoid refcounting it.
      
      The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
      much use to them except on benchmarks.  All other users of ZERO_PAGE are
      converted just to use ZERO_PAGE(0) for simplicity. We can look at
      replacing them all and maybe ripping out ZERO_PAGE completely when we are
      more satisfied with this solution.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus "snif" Torvalds <torvalds@linux-foundation.org>
      557ed1fa
    • F
      readahead: combine file_ra_state.prev_index/prev_offset into prev_pos · f4e6b498
      Fengguang Wu 提交于
      Combine the file_ra_state members
      				unsigned long prev_index
      				unsigned int prev_offset
      into
      				loff_t prev_pos
      
      It is more consistent and better supports huge files.
      
      Thanks to Peter for the nice proposal!
      
      [akpm@linux-foundation.org: fix shift overflow]
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4e6b498
  2. 16 10月, 2007 1 次提交
    • R
      docbook: fix filesystems content · e6716b87
      Randy Dunlap 提交于
      Fix filesystems docbook warnings.
      
      Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name'
      Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode'
      Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent'
      Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value'
      Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6716b87
  3. 15 10月, 2007 7 次提交
  4. 14 10月, 2007 1 次提交
    • P
      lockdep: annotate dir vs file i_mutex · 14358e6d
      Peter Zijlstra 提交于
      On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
      > The circular lock seems to be this:
      > 
      > #1:
      > 
      >   sys_mmap2:              down_write(&mm->mmap_sem);
      >   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
      > 
      > 
      > #0:
      > 
      >   vfs_readdir:     mutex_lock(&inode->i_mutex);
      >    - during the readdir (filldir64), we take a user fault (missing page?)
      >     and call do_page_fault -
      >   do_page_fault:   down_read(&mm->mmap_sem);
      > 
      > 
      > So it does indeed look like a circular locking. Now the question is, "is
      > this a bug?".  Looking like the inode of #1 must be a file or something
      > else that you can mmap and the inode of #0 seems it must be a directory.
      > I would say "no".
      > 
      > Now if you can readdir on a file or mmap a directory, then this could be
      > an issue.
      > 
      > Otherwise, I'd love to see someone teach lockdep about this issue! ;-)
      
      Make a distinction between file and dir usage of i_mutex.
      The inode should be complete and unused at unlock_new_inode(), re-init
      i_mutex depending on its type.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      14358e6d
  5. 15 10月, 2007 1 次提交
  6. 14 10月, 2007 1 次提交
  7. 13 10月, 2007 24 次提交
反馈
建议
客服 返回
顶部