1. 31 7月, 2013 2 次提交
    • B
      aio: convert the ioctx list to table lookup v3 · db446a08
      Benjamin LaHaise 提交于
      On Wed, Jun 12, 2013 at 11:14:40AM -0700, Kent Overstreet wrote:
      > On Mon, Apr 15, 2013 at 02:40:55PM +0300, Octavian Purdila wrote:
      > > When using a large number of threads performing AIO operations the
      > > IOCTX list may get a significant number of entries which will cause
      > > significant overhead. For example, when running this fio script:
      > >
      > > rw=randrw; size=256k ;directory=/mnt/fio; ioengine=libaio; iodepth=1
      > > blocksize=1024; numjobs=512; thread; loops=100
      > >
      > > on an EXT2 filesystem mounted on top of a ramdisk we can observe up to
      > > 30% CPU time spent by lookup_ioctx:
      > >
      > >  32.51%  [guest.kernel]  [g] lookup_ioctx
      > >   9.19%  [guest.kernel]  [g] __lock_acquire.isra.28
      > >   4.40%  [guest.kernel]  [g] lock_release
      > >   4.19%  [guest.kernel]  [g] sched_clock_local
      > >   3.86%  [guest.kernel]  [g] local_clock
      > >   3.68%  [guest.kernel]  [g] native_sched_clock
      > >   3.08%  [guest.kernel]  [g] sched_clock_cpu
      > >   2.64%  [guest.kernel]  [g] lock_release_holdtime.part.11
      > >   2.60%  [guest.kernel]  [g] memcpy
      > >   2.33%  [guest.kernel]  [g] lock_acquired
      > >   2.25%  [guest.kernel]  [g] lock_acquire
      > >   1.84%  [guest.kernel]  [g] do_io_submit
      > >
      > > This patchs converts the ioctx list to a radix tree. For a performance
      > > comparison the above FIO script was run on a 2 sockets 8 core
      > > machine. This are the results (average and %rsd of 10 runs) for the
      > > original list based implementation and for the radix tree based
      > > implementation:
      > >
      > > cores         1         2         4         8         16        32
      > > list       109376 ms  69119 ms  35682 ms  22671 ms  19724 ms  16408 ms
      > > %rsd         0.69%      1.15%     1.17%     1.21%     1.71%     1.43%
      > > radix       73651 ms  41748 ms  23028 ms  16766 ms  15232 ms   13787 ms
      > > %rsd         1.19%      0.98%     0.69%     1.13%    0.72%      0.75%
      > > % of radix
      > > relative    66.12%     65.59%    66.63%    72.31%   77.26%     83.66%
      > > to list
      > >
      > > To consider the impact of the patch on the typical case of having
      > > only one ctx per process the following FIO script was run:
      > >
      > > rw=randrw; size=100m ;directory=/mnt/fio; ioengine=libaio; iodepth=1
      > > blocksize=1024; numjobs=1; thread; loops=100
      > >
      > > on the same system and the results are the following:
      > >
      > > list        58892 ms
      > > %rsd         0.91%
      > > radix       59404 ms
      > > %rsd         0.81%
      > > % of radix
      > > relative    100.87%
      > > to list
      >
      > So, I was just doing some benchmarking/profiling to get ready to send
      > out the aio patches I've got for 3.11 - and it looks like your patch is
      > causing a ~1.5% throughput regression in my testing :/
      ... <snip>
      
      I've got an alternate approach for fixing this wart in lookup_ioctx()...
      Instead of using an rbtree, just use the reserved id in the ring buffer
      header to index an array pointing the ioctx.  It's not finished yet, and
      it needs to be tidied up, but is most of the way there.
      
      		-ben
      --
      "Thought is the essence of where you are now."
      --
      kmo> And, a rework of Ben's code, but this was entirely his idea
      kmo>		-Kent
      
      bcrl> And fix the code to use the right mm_struct in kill_ioctx(), actually
      free memory.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      db446a08
    • B
      aio: double aio_max_nr in calculations · 4cd81c3d
      Benjamin LaHaise 提交于
      With the changes to use percpu counters for aio event ring size calculation,
      existing increases to aio_max_nr are now insufficient to allow for the
      allocation of enough events.  Double the value used for aio_max_nr to account
      for the doubling introduced by the percpu slack.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      4cd81c3d
  2. 30 7月, 2013 9 次提交
    • K
      aio: Kill ki_dtor · d29c445b
      Kent Overstreet 提交于
      sock_aio_dtor() is dead code - and stuff that does need to do cleanup
      can simply do it before calling aio_complete().
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      d29c445b
    • K
      aio: Kill ki_users · 57282d8f
      Kent Overstreet 提交于
      The kiocb refcount is only needed for cancellation - to ensure a kiocb
      isn't freed while a ki_cancel callback is running. But if we restrict
      ki_cancel callbacks to not block (which they currently don't), we can
      simply drop the refcount.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      57282d8f
    • K
      aio: Kill unneeded kiocb members · 8bc92afc
      Kent Overstreet 提交于
      The old aio retry infrastucture needed to save the various arguments to
      to aio operations. But with the retry infrastructure gone, we can trim
      struct kiocb quite a bit.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      8bc92afc
    • K
      aio: Kill aio_rw_vect_retry() · 73a7075e
      Kent Overstreet 提交于
      This code doesn't serve any purpose anymore, since the aio retry
      infrastructure has been removed.
      
      This change should be safe because aio_read/write are also used for
      synchronous IO, and called from do_sync_read()/do_sync_write() - and
      there's no looping done in the sync case (the read and write syscalls).
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      73a7075e
    • K
      aio: Don't use ctx->tail unnecessarily · 5ffac122
      Kent Overstreet 提交于
      aio_complete() (arguably) needs to keep its own trusted copy of the tail
      pointer, but io_getevents() doesn't have to use it - it's already using
      the head pointer from the ring buffer.
      
      So convert it to use the tail from the ring buffer so it touches fewer
      cachelines and doesn't contend with the cacheline aio_complete() needs.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      5ffac122
    • K
      aio: io_cancel() no longer returns the io_event · bec68faa
      Kent Overstreet 提交于
      Originally, io_event() was documented to return the io_event if
      cancellation succeeded - the io_event wouldn't be delivered via the ring
      buffer like it normally would.
      
      But this isn't what the implementation was actually doing; the only
      driver implementing cancellation, the usb gadget code, never returned an
      io_event in its cancel function. And aio_complete() was recently changed
      to no longer suppress event delivery if the kiocb had been cancelled.
      
      This gets rid of the unused io_event argument to kiocb_cancel() and
      kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
      kiocb->ki_cancel() returned success.
      
      Also tweak the refcounting in kiocb_cancel() to make more sense.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      bec68faa
    • K
      aio: percpu ioctx refcount · 723be6e3
      Kent Overstreet 提交于
      This just converts the ioctx refcount to the new generic dynamic percpu
      refcount code.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      723be6e3
    • K
      aio: percpu reqs_available · e1bdd5f2
      Kent Overstreet 提交于
      See the previous patch ("aio: reqs_active -> reqs_available") for why we
      want to do this - this basically implements a per cpu allocator for
      reqs_available that doesn't actually allocate anything.
      
      Note that we need to increase the size of the ringbuffer we allocate,
      since a single thread won't necessarily be able to use all the
      reqs_available slots - some (up to about half) might be on other per cpu
      lists, unavailable for the current thread.
      
      We size the ringbuffer based on the nr_events userspace passed to
      io_setup(), so this is a slight behaviour change - but nr_events wasn't
      being used as a hard limit before, it was being rounded up to the next
      page before so this doesn't change the actual semantics.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      e1bdd5f2
    • K
      aio: reqs_active -> reqs_available · 34e83fc6
      Kent Overstreet 提交于
      The number of outstanding kiocbs is one of the few shared things left that
      has to be touched for every kiocb - it'd be nice to make it percpu.
      
      We can make it per cpu by treating it like an allocation problem: we have
      a maximum number of kiocbs that can be outstanding (i.e.  slots) - then we
      just allocate and free slots, and we know how to write per cpu allocators.
      
      So as prep work for that, we convert reqs_active to reqs_available.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      34e83fc6
  3. 17 7月, 2013 1 次提交
  4. 16 7月, 2013 2 次提交
  5. 14 7月, 2013 3 次提交
  6. 13 7月, 2013 5 次提交
  7. 12 7月, 2013 5 次提交
    • A
      ext4: rate limit printk in buffer_io_error() · e8974c39
      Anatol Pomozov 提交于
      If there are a lot of outstanding buffered IOs when a device is
      taken offline (due to hardware errors etc), ext4_end_bio prints
      out a message for each failed logical block. While this is desirable,
      we see thousands of such lines being printed out before the
      serial console gets overwhelmed, causing ext4_end_bio() wait for
      the printk to complete.
      
      This in itself isn't a disaster, except for the detail that this
      function is being called with the queue lock held.
      This causes any other function in the block layer
      to spin on its spin_lock_irqsave while the serial console is
      draining. If NMI watchdog is enabled on this machine then it
      eventually comes along and shoots the machine in the head.
      
      The end result is that losing any one disk causes the machine to
      go down. This patch rate limits the printk to bandaid around the
      problem.
      
      Tested: xfstests
      Change-Id: I8ab5690dcf4f3a67e78be147d45e489fdf4a88d8
      Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e8974c39
    • P
      CIFS: Fix a deadlock when a file is reopened · 689c3db4
      Pavel Shilovsky 提交于
      If we request reading or writing on a file that needs to be
      reopened, it causes the deadlock: we are already holding rw
      semaphore for reading and then we try to acquire it for writing
      in cifs_relock_file. Fix this by acquiring the semaphore for
      reading in cifs_relock_file due to we don't make any changes in
      locks and don't need a write access.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilovsky@samba.org>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      689c3db4
    • P
      CIFS: Reopen the file if reconnect durable handle failed · b33fcf1c
      Pavel Shilovsky 提交于
      This is a follow-on patch for 8/8 patch from the durable handles
      series. It fixes the problem when durable file handle timeout
      expired on the server and reopen returns -ENOENT for such files.
      Signed-off-by: NPavel Shilovsky <pshilovsky@samba.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      b33fcf1c
    • T
      ext4: don't show usrquota/grpquota twice in /proc/mounts · ad065dd0
      Theodore Ts'o 提交于
      We now print mount options in a generic fashion in
      ext4_show_options(), so we shouldn't be explicitly printing the
      {usr,grp}quota options in ext4_show_quota_options().
      
      Without this patch, /proc/mounts can look like this:
      
       /dev/vdb /vdb ext4 rw,relatime,quota,usrquota,data=ordered,usrquota 0 0
                                            ^^^^^^^^              ^^^^^^^^
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ad065dd0
    • C
      xfs: Fix the logic check for all quotas being turned off · c31ad439
      Chandra Seetharaman 提交于
      During the review of seperate pquota inode patches, David noticed
      that the test to detect all quotas being turned off was
      incorrect, and hence the block was not freeing all the quota
      information.
      
      The check made sense in Irix, but in Linux, quota is turned off
      one at a time, which makes the test invalid for Linux.
      
      This problem existed since XFS was ported to Linux.
      
      David suggested to fix the problem by detecting when all quotas are
      turned off by checking m_qflags.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c31ad439
  8. 11 7月, 2013 13 次提交