1. 26 9月, 2006 1 次提交
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  2. 01 8月, 2006 1 次提交
  3. 01 7月, 2006 2 次提交
  4. 29 6月, 2006 1 次提交
  5. 28 6月, 2006 1 次提交
  6. 23 6月, 2006 1 次提交
    • J
      [PATCH] Kill PF_SYNCWRITE flag · b31dc66a
      Jens Axboe 提交于
      A process flag to indicate whether we are doing sync io is incredibly
      ugly. It also causes performance problems when one does a lot of async
      io and then proceeds to sync it. Part of the io will go out as async,
      and the other part as sync. This causes a disconnect between the
      previously submitted io and the synced io. For io schedulers such as CFQ,
      this will cause us lost merges and suboptimal behaviour in scheduling.
      
      Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
      the O_DIRECT path just directly indicate that the writes are sync
      by using WRITE_SYNC instead.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      b31dc66a
  7. 28 3月, 2006 1 次提交
  8. 27 3月, 2006 5 次提交
  9. 26 3月, 2006 1 次提交
  10. 24 3月, 2006 4 次提交
  11. 23 3月, 2006 1 次提交
  12. 22 3月, 2006 1 次提交
    • C
      [PATCH] page migration reorg · b20a3503
      Christoph Lameter 提交于
      Centralize the page migration functions in anticipation of additional
      tinkering.  Creates a new file mm/migrate.c
      
      1. Extract buffer_migrate_page() from fs/buffer.c
      
      2. Extract central migration code from vmscan.c
      
      3. Extract some components from mempolicy.c
      
      4. Export pageout() and remove_from_swap() from vmscan.c
      
      5. Make it possible to configure NUMA systems without page migration
         and non-NUMA systems with page migration.
      
      I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b20a3503
  13. 15 3月, 2006 1 次提交
  14. 04 2月, 2006 1 次提交
  15. 02 2月, 2006 2 次提交
  16. 17 1月, 2006 1 次提交
  17. 15 1月, 2006 1 次提交
  18. 12 1月, 2006 1 次提交
  19. 10 1月, 2006 1 次提交
  20. 09 1月, 2006 3 次提交
    • A
      [PATCH] fix possible PAGE_CACHE_SHIFT overflows · 54b21a79
      Andrew Morton 提交于
      We've had two instances recently of overflows when doing
      
      	64_bit_value = (32_bit_value << PAGE_CACHE_SHIFT)
      
      I did a tree-wide grep of `<<.*PAGE_CACHE_SHIFT' and this is the result.
      
      - afs_rxfs_fetch_descriptor.offset is of type off_t, which seems broken.
      
      - jfs and jffs are limited to 4GB anyway.
      
      - reiserfs map_block_for_writepage() takes an unsigned long for the block -
        it should take sector_t.  (It'll fail for huge filesystems with
        blocksize<PAGE_CACHE_SIZE)
      
      - cramfs_read() needs to use sector_t (I think cramsfs is busted on large
        filesystems anyway)
      
      - affs is limited in file size anyway.
      
      - I generally didn't fix 32-bit overflows in directory operations.
      
      - arm's __flush_dcache_page() is peculiar.  What if the page lies beyond 4G?
      
      - gss_wrap_req_priv() needs checking (snd_buf->page_base)
      
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: <reiserfs-dev@namesys.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: <linux-fsdevel@vger.kernel.org>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      54b21a79
    • O
      [PATCH] Fix and add EXPORT_SYMBOL(filemap_write_and_wait) · 28fd1298
      OGAWA Hirofumi 提交于
      This patch add EXPORT_SYMBOL(filemap_write_and_wait) and use it.
      
      See mm/filemap.c:
      
      And changes the filemap_write_and_wait() and filemap_write_and_wait_range().
      
      Current filemap_write_and_wait() doesn't wait if filemap_fdatawrite()
      returns error.  However, even if filemap_fdatawrite() returned an
      error, it may have submitted the partially data pages to the device.
      (e.g. in the case of -ENOSPC)
      
      <quotation>
      Andrew Morton writes,
      
      If filemap_fdatawrite() returns an error, this might be due to some
      I/O problem: dead disk, unplugged cable, etc.  Given the generally
      crappy quality of the kernel's handling of such exceptions, there's a
      good chance that the filemap_fdatawait() will get stuck in D state
      forever.
      </quotation>
      
      So, this patch doesn't wait if filemap_fdatawrite() returns the -EIO.
      
      Trond, could you please review the nfs part?  Especially I'm not sure,
      nfs must use the "filemap_fdatawrite(inode->i_mapping) == 0", or not.
      Acked-by: NTrond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      28fd1298
    • O
      [PATCH] fat: support a truncate() for expanding size (generic_cont_expand) · 05eb0b51
      OGAWA Hirofumi 提交于
      This patch changes generic_cont_expand(), in order to share the code
      with fatfs.
      
        - Use vmtruncate() if ->prepare_write() returns a error.
      
      Even if ->prepare_write() returns an error, it may already have added some
      blocks.  So, this truncates blocks outside of ->i_size by vmtruncate().
      
        - Add generic_cont_expand_simple().
      
      The generic_cont_expand_simple() assumes that ->prepare_write() can handle
      the block boundary.  With this, we don't need to care the extra byte.
      
      And for expanding a file size by truncate(), fatfs uses the
      added generic_cont_expand_simple().
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      05eb0b51
  21. 07 11月, 2005 1 次提交
  22. 31 10月, 2005 2 次提交
    • A
      [PATCH] __bread oops fix · a3e713b5
      Andrew Morton 提交于
      If a filesystem passes an idiotic blocksize into bread(), __getblk_slow() will
      warn and will return NULL.  We have a report (from Hubert Tonneau
      <hubert.tonneau@fullpliant.org>) of isofs_fill_super() doing this (passing in
      a silly block size) against an unplugged CDROM drive.
      
      But a couple of __getblk_slow() callers forgot to check for the NULL bh, hence
      oops.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a3e713b5
    • J
      [PATCH] ext3: Fix unmapped buffers in transaction's lists · aaa4059b
      Jan Kara 提交于
      Fix the problem (BUG 4964) with unmapped buffers in transaction's
      t_sync_data list.  The problem is we need to call filesystem's own
      invalidatepage() from block_write_full_page().
      
      block_write_full_page() must call filesystem's invalidatepage().  Otherwise
      following nasty race can happen:
      
         proc 1                                        proc 2
         ------                                        ------
      - write some new data to 'offset'
        => bh gets to the transactions data list
                                                    - starts truncate
                                                      => i_size set to new size
      - mpage_writepages()
        - ext3_ordered_writepage() to 'offset'
          - block_write_full_page()
            - page->index > end_index+1
              - block_invalidatepage()
                - discard_buffer()
                  - clear_buffer_mapped()
      
      - commit triggers and finds unmapped buffer - BOOM!
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aaa4059b
  23. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins 提交于
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
  24. 28 10月, 2005 2 次提交
    • A
      [PATCH] gfp_t: fs/* · 27496a8c
      Al Viro 提交于
       - ->releasepage() annotated (s/int/gfp_t), instances updated
       - missing gfp_t in fs/* added
       - fixed misannotation from the original sweep caught by bitwise checks:
         XFS used __nocast both for gfp_t and for flags used by XFS allocator.
         The latter left with unsigned int __nocast; we might want to add a
         different type for those but for now let's leave them alone.  That,
         BTW, is a case when __nocast use had been actively confusing - it had
         been used in the same code for two different and similar types, with
         no way to catch misuses.  Switch of gfp_t to bitwise had caught that
         immediately...
      
      One tricky bit is left alone to be dealt with later - mapping->flags is
      a mix of gfp_t and error indications.  Left alone for now.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      27496a8c
    • A
      [PATCH] gfp_t: infrastructure · af4ca457
      Al Viro 提交于
      Beginning of gfp_t annotations:
      
       - -Wbitwise added to CHECKFLAGS
       - old __bitwise renamed to __bitwise__
       - __bitwise defined to either __bitwise__ or nothing, depending on
         __CHECK_ENDIAN__ being defined
       - gfp_t switched from __nocast to __bitwise__
       - force cast to gfp_t added to __GFP_... constants
       - new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
         the result to int
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      af4ca457
  25. 09 10月, 2005 1 次提交
  26. 11 9月, 2005 1 次提交
    • I
      [PATCH] spinlock consolidation · fb1c8f93
      Ingo Molnar 提交于
      This patch (written by me and also containing many suggestions of Arjan van
      de Ven) does a major cleanup of the spinlock code.  It does the following
      things:
      
       - consolidates and enhances the spinlock/rwlock debugging code
      
       - simplifies the asm/spinlock.h files
      
       - encapsulates the raw spinlock type and moves generic spinlock
         features (such as ->break_lock) into the generic code.
      
       - cleans up the spinlock code hierarchy to get rid of the spaghetti.
      
      Most notably there's now only a single variant of the debugging code,
      located in lib/spinlock_debug.c.  (previously we had one SMP debugging
      variant per architecture, plus a separate generic one for UP builds)
      
      Also, i've enhanced the rwlock debugging facility, it will now track
      write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
      All locks have lockup detection now, which will work for both soft and hard
      spin/rwlock lockups.
      
      The arch-level include files now only contain the minimally necessary
      subset of the spinlock code - all the rest that can be generalized now
      lives in the generic headers:
      
       include/asm-i386/spinlock_types.h       |   16
       include/asm-x86_64/spinlock_types.h     |   16
      
      I have also split up the various spinlock variants into separate files,
      making it easier to see which does what. The new layout is:
      
         SMP                         |  UP
         ----------------------------|-----------------------------------
         asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
         linux/spinlock_types.h      |  linux/spinlock_types.h
         asm/spinlock_smp.h          |  linux/spinlock_up.h
         linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
         linux/spinlock.h            |  linux/spinlock.h
      
      /*
       * here's the role of the various spinlock/rwlock related include files:
       *
       * on SMP builds:
       *
       *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
       *                        initializers
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
       *                        implementations, mostly inline assembly code
       *
       *   (also included on UP-debug builds:)
       *
       *  linux/spinlock_api_smp.h:
       *                        contains the prototypes for the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       *
       * on UP builds:
       *
       *  linux/spinlock_type_up.h:
       *                        contains the generic, simplified UP spinlock type.
       *                        (which is an empty structure on non-debug builds)
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  linux/spinlock_up.h:
       *                        contains the __raw_spin_*()/etc. version of UP
       *                        builds. (which are NOPs on non-debug, non-preempt
       *                        builds)
       *
       *   (included on UP-non-debug builds:)
       *
       *  linux/spinlock_api_up.h:
       *                        builds the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       */
      
      All SMP and UP architectures are converted by this patch.
      
      arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
      crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
      be mostly fine.
      
      From: Grant Grundler <grundler@parisc-linux.org>
      
        Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
        Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
        non-SMP kernels.  That should be trivial to fix up later if necessary.
      
        I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
        some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
        are well tested and contained entirely inside arch specific code.  I do NOT
        expect any new issues to arise with them.
      
       If someone does ever need to use debug/metrics with them, then they will
        need to unravel this hairball between spinlocks, atomic ops, and bit ops
        that exist only because parisc has exactly one atomic instruction: LDCW
        (load and clear word).
      
      From: "Luck, Tony" <tony.luck@intel.com>
      
         ia64 fix
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjanv@infradead.org>
      Signed-off-by: NGrant Grundler <grundler@parisc-linux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Signed-off-by: NHirokazu Takata <takata@linux-m32r.org>
      Signed-off-by: NMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb1c8f93
  27. 08 9月, 2005 1 次提交