1. 11 12月, 2006 1 次提交
    • Z
      [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED · 8459d86a
      Zach Brown 提交于
      The only time it is safe to call aio_complete() is when the ->ki_retry
      function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
      historically done this by relying on its caller to translate positive return
      codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
      conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
      going to call aio_complete().  It would reverse the test and wait and free the
      dio in the cases it thought that finished_one_bio() wasn't going to.
      
      Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
      errno from the submission path but it failed to communicate this to
      finished_one_bio().  direct_io_worker() would return < 0, it's callers
      wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
      future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
      would be called for a second time which can manifest as an oops.
      
      The previous cleanups have whittled the sync and async completion paths down
      to the point where we can collapse them and clearly reassert the invariant
      that we must only call aio_complete() after returning -EIOCBQUEUED.
      direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
      drop the dio refcount and the aio bio completion path will only call
      aio_complete() when it is the last to drop the dio refcount.
      direct_io_worker() can ensure that it is the last to drop the reference count
      by waiting for bios to drain.  It does this for sync ops, of course, and for
      partial dio writes that must fall back to buffered and for aio ops that saw
      errors during submission.
      
      This means that operations that end up waiting, even if they were issued as
      aio ops, will not call aio_complete() from dio.  Instead we return the return
      code of the operation and let the aio core call aio_complete().  This is
      purposely done to fix a bug where AIO DIO file extensions would call
      aio_complete() before their callers have a chance to update i_size.
      
      Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
      no longer have to translate for it.  XFS needs to be careful not to free
      resources that will be used during AIO completion if -EIOCBQUEUED is returned.
       We maintain the previous behaviour of trying to write fs metadata for O_SYNC
      aio+dio writes.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Suparna Bhattacharya <suparna@in.ibm.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: <xfs-masters@oss.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8459d86a
  2. 09 12月, 2006 1 次提交
  3. 08 12月, 2006 1 次提交
  4. 02 12月, 2006 1 次提交
  5. 29 10月, 2006 1 次提交
  6. 21 10月, 2006 2 次提交
  7. 20 10月, 2006 1 次提交
  8. 04 10月, 2006 1 次提交
  9. 01 10月, 2006 5 次提交
    • B
      [PATCH] Streamline generic_file_* interfaces and filemap cleanups · 543ade1f
      Badari Pulavarty 提交于
      This patch cleans up generic_file_*_read/write() interfaces.  Christoph
      Hellwig gave me the idea for this clean ups.
      
      In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
      do_sync_read/ do_sync_write() as their .read/.write methods.  This allows us
      to cleanup all variants of generic_file_* routines.
      
      Final available interfaces:
      
      generic_file_aio_read() - read handler
      generic_file_aio_write() - write handler
      generic_file_aio_write_nolock() - no lock write handler
      
      __generic_file_aio_write_nolock() - internal worker routine
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      543ade1f
    • B
      [PATCH] Remove readv/writev methods and use aio_read/aio_write instead · ee0b3e67
      Badari Pulavarty 提交于
      This patch removes readv() and writev() methods and replaces them with
      aio_read()/aio_write() methods.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee0b3e67
    • B
      [PATCH] Vectorize aio_read/aio_write fileop methods · 027445c3
      Badari Pulavarty 提交于
      This patch vectorizes aio_read() and aio_write() methods to prepare for
      collapsing all aio & vectored operations into one interface - which is
      aio_read()/aio_write().
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      027445c3
    • D
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells 提交于
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9361401e
    • D
      [PATCH] BLOCK: Move functions out of buffer code [try #6] · cf9a2ae8
      David Howells 提交于
      Move some functions out of the buffering code that aren't strictly buffering
      specific.  This is a precursor to being able to disable the block layer.
      
       (*) Moved some stuff out of fs/buffer.c:
      
           (*) The file sync and general sync stuff moved to fs/sync.c.
      
           (*) The superblock sync stuff moved to fs/super.c.
      
           (*) do_invalidatepage() moved to mm/truncate.c.
      
           (*) try_to_release_page() moved to mm/filemap.c.
      
       (*) Moved some related declarations between header files:
      
           (*) declarations for do_invalidatepage() and try_to_release_page() moved
           	 to linux/mm.h.
      
           (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf9a2ae8
  10. 30 9月, 2006 1 次提交
  11. 28 9月, 2006 2 次提交
  12. 26 9月, 2006 2 次提交
  13. 30 7月, 2006 1 次提交
  14. 26 7月, 2006 1 次提交
    • S
      [GFS2] Alter direct I/O path · a9e5f4d0
      Steven Whitehouse 提交于
      As per comments received, alter the GFS2 direct I/O path so that
      it uses the standard read functions "out of the box". Needs a
      small change to one of the VFS functions. This reduces the size
      of the code quite a lot and also removes the need for one new export.
      
      Some more work remains to be done, but this is the bones of the
      thing.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a9e5f4d0
  15. 01 7月, 2006 3 次提交
  16. 30 6月, 2006 1 次提交
    • A
      [PATCH] generic_file_buffered_write(): handle zero-length iovec segments · 81b0c871
      Andrew Morton 提交于
      The recent generic_file_write() deadlock fix caused
      generic_file_buffered_write() to loop inifinitely when presented with a
      zero-length iovec segment.  Fix.
      
      Note that this fix deliberately avoids calling ->prepare_write(),
      ->commit_write() etc with a zero-length write.  This is because I don't trust
      all filesystems to get that right.
      
      This is a cautious approach, for 2.6.17.x.  For 2.6.18 we should just go ahead
      and call ->prepare_write() and ->commit_write() with the zero length and fix
      any broken filesystems.  So I'll make that change once this code is stabilised
      and backported into 2.6.17.x.
      
      The reason for preferring to call ->prepare_write() and ->commit_write() with
      the zero-length segment: a zero-length segment _should_ be sufficiently
      uncommon that this is the correct way of handling it.  We don't want to
      optimise for poorly-written userspace at the expense of well-written
      userspace.
      
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: <stable@kernel.org>
      Cc: walt <wa1ter@myrealbox.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      81b0c871
  17. 29 6月, 2006 1 次提交
  18. 28 6月, 2006 1 次提交
    • V
      [PATCH] generic_file_buffered_write(): deadlock on vectored write · 6527c2bd
      Vladimir V. Saveliev 提交于
      generic_file_buffered_write() prefaults in user pages in order to avoid
      deadlock on copying from the same page as write goes to.
      
      However, it looks like there is a problem when write is vectored:
      fault_in_pages_readable brings in current segment or its part (maxlen).
      OTOH, filemap_copy_from_user_iovec is called to copy number of bytes
      (bytes) which may exceed current segment, so filemap_copy_from_user_iovec
      switches to the next segment which is not brought in yet.  Pagefault is
      generated.  That causes the deadlock if pagefault is for the same page
      write goes to: page being written is locked and not uptodate, pagefault
      will deadlock trying to lock locked page.
      
      [akpm@osdl.org: somewhat rewritten]
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6527c2bd
  19. 26 6月, 2006 2 次提交
    • W
      [PATCH] readahead: backoff on I/O error · 76d42bd9
      Wu Fengguang 提交于
      Backoff readahead size exponentially on I/O error.
      
      Michael Tokarev <mjt@tls.msk.ru> described the problem as:
      
      [QUOTE]
      Suppose there's a CD-rom with a scratch/etc, one sector is unreadable.
      In order to "fix" it, one have to read it and write to another CD-rom,
      or something.. or just ignore the error (if it's just a skip in a video
      stream).  Let's assume the unreadable block is number U.
      
      But current behavior is just insane.  An application requests block
      number N, which is before U. Kernel tries to read-ahead blocks N..U.
      Cdrom drive tries to read it, re-read it.. for some time.  Finally,
      when all the N..U-1 blocks are read, kernel returns block number N
      (as requested) to an application, successefully.
      
      Now an app requests block number N+1, and kernel tries to read
      blocks N+1..U+1.  Retrying again as in previous step.
      
      And so on, up to when an app requests block number U-1.  And when,
      finally, it requests block U, it receives read error.
      
      So, kernel currentry tries to re-read the same failing block as
      many times as the current readahead value (256 (times?) by default).
      
      This whole process already killed my cdrom drive (I posted about it
      to LKML several months ago) - literally, the drive has fried, and
      does not work anymore.  Ofcourse that problem was a bug in firmware
      (or whatever) of the drive *too*, but.. main problem with that is
      current readahead logic as described above.
      [/QUOTE]
      
      Which was confirmed by Jens Axboe <axboe@suse.de>:
      
      [QUOTE]
      For ide-cd, it tends do only end the first part of the request on a
      medium error. So you may see a lot of repeats :/
      [/QUOTE]
      
      With this patch, retries are expected to be reduced from, say, 256, to 5.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NWu Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      76d42bd9
    • N
      [PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytes · 01408c49
      NeilBrown 提交于
      The problem is that when we write to a file, the copy from userspace to
      pagecache is first done with preemption disabled, so if the source address is
      not immediately available the copy fails *and* *zeros* *the* *destination*.
      
      This is a problem because a concurrent read (which admittedly is an odd thing
      to do) might see zeros rather that was there before the write, or what was
      there after, or some mixture of the two (any of these being a reasonable thing
      to see).
      
      If the copy did fail, it will immediately be retried with preemption
      re-enabled so any transient problem with accessing the source won't cause an
      error.
      
      The first copying does not need to zero any uncopied bytes, and doing so
      causes the problem.  It uses copy_from_user_atomic rather than copy_from_user
      so the simple expedient is to change copy_from_user_atomic to *not* zero out
      bytes on failure.
      
      The first of these two patches prepares for the change by fixing two places
      which assume copy_from_user_atomic does zero the tail.  The two usages are
      very similar pieces of code which copy from a userspace iovec into one or more
      page-cache pages.  These are changed to remove the assumption.
      
      The second patch changes __copy_from_user_inatomic* to not zero the tail.
      Once these are accepted, I will look at similar patches of other architectures
      where this is important (ppc, mips and sparc being the ones I can find).
      
      This patch:
      
      There is a problem with __copy_from_user_inatomic zeroing the tail of the
      buffer in the case of an error.  As it is called in atomic context, the error
      may be transient, so it results in zeros being written where maybe they
      shouldn't be.
      
      In the usage in filemap, this opens a window for a well timed read to see data
      (zeros) which is not consistent with any ordering of reads and writes.
      
      Most cases where __copy_from_user_inatomic is called, a failure results in
      __copy_from_user being called immediately.  As long as the latter zeros the
      tail, the former doesn't need to.  However in *copy_from_user_iovec
      implementations (in both filemap and ntfs/file), it is assumed that
      copy_from_user_inatomic will zero the tail.
      
      This patch removes that assumption, so that after this patch it will
      be safe for copy_from_user_inatomic to not zero the tail.
      
      This patch also adds some commentary to filemap.h and asm-i386/uaccess.h.
      
      After this patch, all architectures that might disable preempt when
      kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed".
      This includes
       - powerpc
       - i386
       - mips
       - sparc
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      01408c49
  20. 23 6月, 2006 3 次提交
    • H
      [PATCH] x86: cache pollution aware __copy_from_user_ll() · c22ce143
      Hiro Yoshioka 提交于
      Use the x86 cache-bypassing copy instructions for copy_from_user().
      
      Some performance data are
      
      Total of GLOBAL_POWER_EVENTS (CPU cycle samples)
      
      2.6.12.4.orig    1921587
      2.6.12.4.nt      1599424
      1599424/1921587=83.23% (16.77% reduction)
      
      BSQ_CACHE_REFERENCE (L3 cache miss)
      2.6.12.4.orig      57427
      2.6.12.4.nt        20858
      20858/57427=36.32% (63.7% reduction)
      
      L3 cache miss reduction of __copy_from_user_ll
      samples  %
      37408    65.1412  vmlinux                  __copy_from_user_ll
      23        0.1103  vmlinux                  __copy_user_zeroing_intel_nocache
      23/37408=0.061% (99.94% reduction)
      
      Top 5 of 2.6.12.4.nt
      Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
      samples  %        app name                 symbol name
      128392    8.0274  vmlinux                  __copy_user_zeroing_intel_nocache
      64206     4.0143  vmlinux                  journal_add_journal_head
      59746     3.7355  vmlinux                  do_get_write_access
      47674     2.9807  vmlinux                  journal_put_journal_head
      46021     2.8774  vmlinux                  journal_dirty_metadata
      pattern9-0-cpu4-0-09011728/summary.out
      
      Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x3f (multiple flags) count 3000
      samples  %        app name                 symbol name
      69755     4.2861  vmlinux                  __copy_user_zeroing_intel_nocache
      55685     3.4215  vmlinux                  journal_add_journal_head
      52371     3.2179  vmlinux                  __find_get_block
      45504     2.7960  vmlinux                  journal_put_journal_head
      36005     2.2123  vmlinux                  journal_stop
      pattern9-0-cpu4-0-09011744/summary.out
      
      Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x200 (read 3rd level cache miss) count 3000
      samples  %        app name                 symbol name
      1147      5.4994  vmlinux                  journal_add_journal_head
      881       4.2240  vmlinux                  journal_dirty_data
      872       4.1809  vmlinux                  blk_rq_map_sg
      734       3.5192  vmlinux                  journal_commit_transaction
      617       2.9582  vmlinux                  radix_tree_delete
      pattern9-0-cpu4-0-09011731/summary.out
      
      iozone results are
      
      original 2.6.12.4 CPU time = 207.768 sec
      cache aware       CPU time = 184.783 sec
      (three times run)
      184.783/207.768=88.94% (11.06% reduction)
      
      original:
      pattern9-0-cpu4-0-08191720/iozone.out:  CPU Utilization: Wall time   45.997    CPU time   64.527    CPU utilization 140.28 %
      pattern9-0-cpu4-0-08191741/iozone.out:  CPU Utilization: Wall time   46.878    CPU time   71.933    CPU utilization 153.45 %
      pattern9-0-cpu4-0-08191743/iozone.out:  CPU Utilization: Wall time   45.152    CPU time   71.308    CPU utilization 157.93 %
      
      cache awre:
      pattern9-0-cpu4-0-09011728/iozone.out:  CPU Utilization: Wall time   44.842    CPU time   62.465    CPU utilization 139.30 %
      pattern9-0-cpu4-0-09011731/iozone.out:  CPU Utilization: Wall time   44.718    CPU time   59.273    CPU utilization 132.55 %
      pattern9-0-cpu4-0-09011744/iozone.out:  CPU Utilization: Wall time   44.367    CPU time   63.045    CPU utilization 142.10 %
      Signed-off-by: NHiro Yoshioka <hyoshiok@miraclelinux.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c22ce143
    • R
      [PATCH] kernel-doc for mm/filemap.c · 485bb99b
      Randy Dunlap 提交于
      mm/filemap.c:
      - add lots of kernel-doc;
      - fix some typos and kernel-doc errors;
      - drop some blank lines between function close and EXPORT_SYMBOL();
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      485bb99b
    • O
      [PATCH] writeback: fix range handling · 111ebb6e
      OGAWA Hirofumi 提交于
      When a writeback_control's `start' and `end' fields are used to
      indicate a one-byte-range starting at file offset zero, the required
      values of .start=0,.end=0 mean that the ->writepages() implementation
      has no way of telling that it is being asked to perform a range
      request.  Because we're currently overloading (start == 0 && end == 0)
      to mean "this is not a write-a-range request".
      
      To make all this sane, the patch changes range of writeback_control.
      
      So caller does: If it is calling ->writepages() to write pages, it
      sets range (range_start/end or range_cyclic) always.
      
      And if range_cyclic is true, ->writepages() thinks the range is
      cyclic, otherwise it just uses range_start and range_end.
      
      This patch does,
      
          - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
            -1 is usually ok for range_end (type is long long). But, if someone did,
      
      		range_end += val;		range_end is "val - 1"
      		u64val = range_end >> bits;	u64val is "~(0ULL)"
      
            or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
            things, and uses LLONG_MAX for range_end.
      
          - All callers of ->writepages() sets range_start/end or range_cyclic.
      
          - Fix updates of ->writeback_index. It seems already bit strange.
            If it starts at 0 and ended by check of nr_to_write, this last
            index may reduce chance to scan end of file.  So, this updates
            ->writeback_index only if range_cyclic is true or whole-file is
            scanned.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      111ebb6e
  21. 21 6月, 2006 1 次提交
  22. 27 4月, 2006 1 次提交
  23. 24 3月, 2006 3 次提交
    • A
      [PATCH] fadvise(): write commands · ebcf28e1
      Andrew Morton 提交于
      Add two new linux-specific fadvise extensions():
      
      LINUX_FADV_ASYNC_WRITE: start async writeout of any dirty pages between file
      offsets `offset' and `offset+len'.  Any pages which are currently under
      writeout are skipped, whether or not they are dirty.
      
      LINUX_FADV_WRITE_WAIT: wait upon writeout of any dirty pages between file
      offsets `offset' and `offset+len'.
      
      By combining these two operations the application may do several things:
      
      LINUX_FADV_ASYNC_WRITE: push some or all of the dirty pages at the disk.
      
      LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE: push all of the currently dirty
      pages at the disk.
      
      LINUX_FADV_WRITE_WAIT, LINUX_FADV_ASYNC_WRITE, LINUX_FADV_WRITE_WAIT: push all
      of the currently dirty pages at the disk, wait until they have been written.
      
      It should be noted that none of these operations write out the file's
      metadata.  So unless the application is strictly performing overwrites of
      already-instantiated disk blocks, there are no guarantees here that the data
      will be available after a crash.
      
      To complete this suite of operations I guess we should have a "sync file
      metadata only" operation.  This gives applications access to all the building
      blocks needed for all sorts of sync operations.  But sync-metadata doesn't fit
      well with the fadvise() interface.  Probably it should be a new syscall:
      sys_fmetadatasync().
      
      The patch also diddles with the meaning of `endbyte' in sys_fadvise64_64().
      It is made to represent that last affected byte in the file (ie: it is
      inclusive).  Generally, all these byterange and pagerange functions are
      inclusive so we can easily represent EOF with -1.
      
      As Ulrich notes, these two functions are somewhat abusive of the fadvise()
      concept, which appears to be "set the future policy for this fd".
      
      But these commands are a perfect fit with the fadvise() impementation, and
      several of the existing fadvise() commands are synchronous and don't affect
      future policy either.   I think we can live with the slight incongruity.
      
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ebcf28e1
    • A
      [PATCH] filemap_fdatawrite_range() api: clarify -end parameter · 469eb4d0
      Andrew Morton 提交于
      I had trouble understanding working out whether filemap_fdatawrite_range()'s
      `end' parameter describes the last-byte-to-be-written or the last-plus-one.
      Clarify that in comments.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      469eb4d0
    • P
      [PATCH] cpuset memory spread page cache implementation and hooks · 44110fe3
      Paul Jackson 提交于
      Change the page cache allocation calls to support cpuset memory spreading.
      
      See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
      spreading.
      
      On systems without cpusets configured in the kernel, this is no change.
      
      On systems with cpusets configured in the kernel, but the "memory_spread"
      cpuset option not enabled for the current tasks cpuset, this adds a call to a
      cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.
      
      On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
      routine that computes which of the tasks mems_allowed nodes should be
      preferred for this allocation.
      
      If memory spreading applies to a particular allocation, then any other NUMA
      mempolicy does not apply.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      44110fe3
  24. 22 3月, 2006 1 次提交
  25. 30 1月, 2006 1 次提交
  26. 19 1月, 2006 1 次提交