1. 22 5月, 2010 4 次提交
  2. 11 5月, 2010 1 次提交
  3. 24 4月, 2010 1 次提交
  4. 22 4月, 2010 1 次提交
    • E
      fasync: RCU and fine grained locking · 989a2979
      Eric Dumazet 提交于
      kill_fasync() uses a central rwlock, candidate for RCU conversion, to
      avoid cache line ping pongs on SMP.
      
      fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short
      section instead during whole list scan.
      
      Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and
      fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync()
      doesnt need its own implementation and can use fasync_helper(), to
      reduce code size and complexity.
      
      We can remove __kill_fasync() direct use in net/socket.c, and rename it
      to kill_fasync_rcu().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      989a2979
  5. 07 4月, 2010 2 次提交
  6. 07 3月, 2010 2 次提交
    • A
      include/linux/fs.h: convert FMODE_* constants to hex · 19adf9c5
      Andrew Morton 提交于
      It was tolerable until Eric went and added 8388608.
      
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19adf9c5
    • W
      readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM · 0141450f
      Wu Fengguang 提交于
      This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.
      
      POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
      a 16K read will be carried out in 4 _sync_ 1-page reads.
      
      In other places, ra_pages==0 means
      - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
      - some IO error happened
      where multi-page read IO won't help or should be avoided.
      
      POSIX_FADV_RANDOM actually want a different semantics: to disable the
      *heuristic* readahead algorithm, and to use a dumb one which faithfully
      submit read IO for whatever application requests.
      
      So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.
      
      Note that the random hint is not likely to help random reads performance
      noticeably.  And it may be too permissive on huge request size (its IO
      size is not limited by read_ahead_kb).
      
      In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
      (NFS read) performance of the application increased by 313%!
      Tested-by: NQuentin Barnes <qbarnes+nfs@yahoo-inc.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: <stable@kernel.org>			[2.6.33.x]
      Cc: <qbarnes+nfs@yahoo-inc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0141450f
  7. 06 3月, 2010 1 次提交
  8. 04 3月, 2010 6 次提交
  9. 19 2月, 2010 1 次提交
  10. 14 1月, 2010 1 次提交
    • A
      Fix ACC_MODE() for real · 6d125529
      Al Viro 提交于
      commit 5300990c had stepped on a rather
      nasty mess: definitions of ACC_MODE used to be different.  Fixed the
      resulting breakage, converting them to variant that takes O_... value;
      all callers have that and it actually simplifies life (see tomoyo part
      of changes).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6d125529
  11. 23 12月, 2009 3 次提交
  12. 18 12月, 2009 2 次提交
  13. 17 12月, 2009 3 次提交
    • D
      xfs: improve metadata I/O merging in the elevator · 2ee1abad
      Dave Chinner 提交于
      Change all async metadata buffers to use [READ|WRITE]_META I/O types
      so that the I/O doesn't get issued immediately. This allows merging of
      adjacent metadata requests but still prioritises them over bulk data.
      This shows a 10-15% improvement in sequential create speed of small
      files.
      
      Don't include the log buffers in this classification - leave them as
      sync types so they are issued immediately.
      Signed-off-by: NDave Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      2ee1abad
    • C
      cleanup blockdev_direct_IO locking · 1e431f5c
      Christoph Hellwig 提交于
      Currently the locking in blockdev_direct_IO is a mess, we have three different
      locking types and very confusing checks for some of them.  The most
      complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be
      used.
      
      This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case
      is unused anyway, and the write side is almost identical to DIO_NO_LOCKING.
      The difference is that DIO_NO_LOCKING always sets the create argument for
      the get_blocks callback to zero, but we can easily move that to the actual
      get_blocks callbacks.  There are four users of the DIO_NO_LOCKING mode:
      gfs already ignores the create argument and thus is fine with the new
      version, ocfs2 only errors out if create were ever set, and we can remove
      this dead code now, the block device code only ever uses create for an
      error message if we are fully beyond the device which can never happen,
      and last but not least XFS will need the new behavour for writes.
      
      Now we can replace the lock_type variable with a flags one, where no flag
      means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
      flag.  Separate out the check for not allowing to fill holes into a separate
      flag, although for now both flags always get set at the same time.
      
      Also revamp the documentation of the locking scheme to actually make sense.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1e431f5c
    • E
      fs: move get_empty_filp() deffinition to internal.h · e81e3f4d
      Eric Paris 提交于
      All users outside of fs/ of get_empty_filp() have been removed.  This patch
      moves the definition from the include/ directory to internal.h so no new
      users crop up and removes the EXPORT_SYMBOL.  I'd love to see open intents
      stop using it too, but that's a problem for another day and a smarter
      developer!
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e81e3f4d
  14. 16 12月, 2009 1 次提交
    • C
      direct-io: cleanup blockdev_direct_IO locking · 5fe878ae
      Christoph Hellwig 提交于
      Currently the locking in blockdev_direct_IO is a mess, we have three
      different locking types and very confusing checks for some of them.  The
      most complicated one is DIO_OWN_LOCKING for reads, which happens to not
      actually be used.
      
      This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
      case is unused anyway, and the write side is almost identical to
      DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
      create argument for the get_blocks callback to zero, but we can easily
      move that to the actual get_blocks callbacks.  There are four users of the
      DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
      fine with the new version, ocfs2 only errors out if create were ever set,
      and we can remove this dead code now, the block device code only ever uses
      create for an error message if we are fully beyond the device which can
      never happen, and last but not least XFS will need the new behavour for
      writes.
      
      Now we can replace the lock_type variable with a flags one, where no flag
      means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
      flag.  Separate out the check for not allowing to fill holes into a
      separate flag, although for now both flags always get set at the same
      time.
      
      Also revamp the documentation of the locking scheme to actually make
      sense.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fe878ae
  15. 10 12月, 2009 1 次提交
  16. 03 12月, 2009 1 次提交
    • M
      block: Allow devices to indicate whether discarded blocks are zeroed · 98262f27
      Martin K. Petersen 提交于
      The discard ioctl is used by mkfs utilities to clear a block device
      prior to putting metadata down.  However, not all devices return zeroed
      blocks after a discard.  Some drives return stale data, potentially
      containing old superblocks.  It is therefore important to know whether
      discarded blocks are properly zeroed.
      
      Both ATA and SCSI drives have configuration bits that indicate whether
      zeroes are returned after a discard operation.  Implement a block level
      interface that allows this information to be bubbled up the stack and
      queried via a new block device ioctl.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      98262f27
  17. 26 11月, 2009 1 次提交
    • V
      Fix regression in direct writes performance due to WRITE_ODIRECT flag removal · d9449ce3
      Vivek Goyal 提交于
      There seems to be a regression in direct write path due to following
      commit in for-2.6.33 branch of block tree.
      
      commit 1af60fbd
      Author: Jeff Moyer <jmoyer@redhat.com>
      Date:   Fri Oct 2 18:56:53 2009 -0400
      
          block: get rid of the WRITE_ODIRECT flag
      
      Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets
      the NOIDLE flag in bio and hence in request. This tells CFQ to not expect
      more request from the queue and not idle on it (despite the fact that
      queue's think time is less and it is not seeky).
      
      So direct writers lose big time when competing with sequential readers.
      
      Using fio, I have run one direct writer and two sequential readers and
      following are the results with 2.6.32-rc7 kernel and with for-2.6.33
      branch.
      
      Test
      ====
      1 direct writer and 2 sequential reader running simultaneously.
      
      [global]
      directory=/mnt/sdc/fio/
      runtime=10
      
      [seqwrite]
      rw=write
      size=4G
      direct=1
      
      [seqread]
      rw=read
      size=2G
      numjobs=2
      
      2.6.32-rc7
      ==========
      direct writes: aggrb=2,968KB/s
      readers	     : aggrb=101MB/s
      
      for-2.6.33 branch
      =================
      direct write: aggrb=19KB/s
      readers	      aggrb=137MB/s
      
      This patch brings back the WRITE_ODIRECT flag, with the difference that we
      don't set the BIO_RW_UNPLUG flag so that device is not unplugged after
      submission of request and an explicit unplug from submitter is required.
      
      That way we fix the jeff's issue of not enough merging taking place in aio
      path as well as make sure direct writes get their fair share.
      
      After the fix
      =============
      for-2.6.33 + fix
      ----------------
      direct writes: aggrb=2,728KB/s
      reads: aggrb=103MB/s
      
      Thanks
      Vivek
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d9449ce3
  18. 28 10月, 2009 1 次提交
  19. 04 10月, 2009 1 次提交
  20. 02 10月, 2009 1 次提交
  21. 24 9月, 2009 5 次提交
    • A
      sysctl: remove "struct file *" argument of ->proc_handler · 8d65af78
      Alexey Dobriyan 提交于
      It's unused.
      
      It isn't needed -- read or write flag is already passed and sysctl
      shouldn't care about the rest.
      
      It _was_ used in two places at arch/frv for some reason.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d65af78
    • N
      truncate: new helpers · 25d9e2d1
      npiggin@suse.de 提交于
      Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
      vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
      into mm/truncate.c.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      25d9e2d1
    • C
      freeze_bdev: grab active reference to frozen superblocks · 4504230a
      Christoph Hellwig 提交于
      Currently we held s_umount while a filesystem is frozen, despite that we
      might return to userspace and unlock it from a different process.  Instead
      grab an active reference to keep the file system busy and add an explicit
      check for frozen filesystems in remount and reject the remount instead
      of blocking on s_umount.
      
      Add a new get_active_super helper to super.c for use by freeze_bdev that
      grabs an active reference to a superblock from a given block device.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4504230a
    • C
      freeze_bdev: kill bd_mount_sem · 4fadd7bb
      Christoph Hellwig 提交于
      Now that we have the freeze count there is not much reason for bd_mount_sem
      anymore.  The actual freeze/thaw operations are serialized using the
      bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
      get_sb_bdev which tries to prevent mounting a filesystem while the block
      device is frozen.  Instead of add a check for bd_fsfreeze_count and
      return -EBUSY if a filesystem is frozen.  While that is a change in user
      visible behaviour a failing mount is much better for this case rather
      than having the mount process stuck uninterruptible for a long time.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4fadd7bb
    • J
      vfs: change sb->s_maxbytes to a loff_t · 42cb56ae
      Jeff Layton 提交于
      sb->s_maxbytes is supposed to indicate the maximum size of a file that can
      exist on the filesystem.  It's declared as an unsigned long long.
      
      Even if a filesystem has no inherent limit that prevents it from using
      every bit in that unsigned long long, it's still problematic to set it to
      anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
      that cast s_maxbytes to a signed value.  If it's set too large then this
      cast makes it a negative number and generally breaks the comparison.
      
      Change s_maxbytes to be loff_t instead.  That should help eliminate the
      temptation to set it too large by making it a signed value.
      
      Also, add a warning for couple of releases to help catch filesystems that
      set s_maxbytes too large.  Eventually we can either convert this to a
      BUG() or just remove it and in the hope that no one will get it wrong now
      that it's a signed value.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Cc: Mandeep Singh Baines <msb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      42cb56ae