1. 08 5月, 2013 10 次提交
    • K
      aio: make aio_read_evt() more efficient, convert to hrtimers · a31ad380
      Kent Overstreet 提交于
      Previously, aio_read_event() pulled a single completion off the
      ringbuffer at a time, locking and unlocking each time.  Change it to
      pull off as many events as it can at a time, and copy them directly to
      userspace.
      
      This also fixes a bug where if copying the event to userspace failed,
      we'd lose the event.
      
      Also convert it to wait_event_interruptible_hrtimeout(), which
      simplifies it quite a bit.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a31ad380
    • K
      aio: refcounting cleanup · 36f55889
      Kent Overstreet 提交于
      The usage of ctx->dead was fubar - it makes no sense to explicitly check
      it all over the place, especially when we're already using RCU.
      
      Now, ctx->dead only indicates whether we've dropped the initial
      refcount. The new teardown sequence is:
      
        set ctx->dead
        hlist_del_rcu();
        synchronize_rcu();
      
      Now we know no system calls can take a new ref, and it's safe to drop
      the initial ref:
      
        put_ioctx();
      
      We also need to ensure there are no more outstanding kiocbs.  This was
      done incorrectly - it was being done in kill_ctx(), and before dropping
      the initial refcount.  At this point, other syscalls may still be
      submitting kiocbs!
      
      Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
      kioctx->users has dropped to 0 and we know no more iocbs could be
      submitted.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36f55889
    • K
      aio: make aio_put_req() lockless · 11599eba
      Kent Overstreet 提交于
      Freeing a kiocb needed to touch the kioctx for three things:
      
       * Pull it off the reqs_active list
       * Decrementing reqs_active
       * Issuing a wakeup, if the kioctx was in the process of being freed.
      
      This patch moves these to aio_complete(), for a couple reasons:
      
       * aio_complete() already has to issue the wakeup, so if we drop the
         kioctx refcount before aio_complete does its wakeup we don't have to
         do it twice.
       * aio_complete currently has to take the kioctx lock, so it makes sense
         for it to pull the kiocb off the reqs_active list too.
       * A later patch is going to change reqs_active to include unreaped
         completions - this will mean allocating a kiocb doesn't have to look
         at the ringbuffer. So taking the decrement of reqs_active out of
         kiocb_free() is useful prep work for that patch.
      
      This doesn't really affect cancellation, since existing (usb) code that
      implements a cancel function still calls aio_complete() - we just have
      to make sure that aio_complete does the necessary teardown for cancelled
      kiocbs.
      
      It does affect code paths where we free kiocbs that were never
      submitted; they need to decrement reqs_active and pull the kiocb off the
      reqs_active list.  This occurs in two places: kiocb_batch_free(), which
      is going away in a later patch, and the error path in io_submit_one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11599eba
    • K
      aio: do fget() after aio_get_req() · 1d98ebfc
      Kent Overstreet 提交于
      aio_get_req() will fail if we have the maximum number of requests
      outstanding, which depending on the application may not be uncommon.  So
      avoid doing an unnecessary fget().
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d98ebfc
    • K
      aio: dprintk() -> pr_debug() · caf4167a
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      caf4167a
    • K
      aio: move private stuff out of aio.h · 4e179bca
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e179bca
    • K
      aio: add kiocb_cancel() · 906b973c
      Kent Overstreet 提交于
      Minor refactoring, to get rid of some duplicated code
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      906b973c
    • K
      aio: kill return value of aio_complete() · 2d68449e
      Kent Overstreet 提交于
      Nothing used the return value, and it probably wasn't possible to use it
      safely for the locked versions (aio_complete(), aio_put_req()).  Just
      kill it.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Acked-by: NZach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d68449e
    • Z
      aio: remove retry-based AIO · 41003a7b
      Zach Brown 提交于
      This removes the retry-based AIO infrastructure now that nothing in tree
      is using it.
      
      We want to remove retry-based AIO because it is fundemantally unsafe.
      It retries IO submission from a kernel thread that has only assumed the
      mm of the submitting task.  All other task_struct references in the IO
      submission path will see the kernel thread, not the submitting task.
      This design flaw means that nothing of any meaningful complexity can use
      retry-based AIO.
      
      This removes all the code and data associated with the retry machinery.
      The most significant benefit of this is the removal of the locking
      around the unused run list in the submission path.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Signed-off-by: NZach Brown <zab@redhat.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41003a7b
    • N
      hugetlbfs: fix mmap failure in unaligned size request · af73e4d9
      Naoya Horiguchi 提交于
      The current kernel returns -EINVAL unless a given mmap length is
      "almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
      given length is passed to vm_mmap_pgoff() as it is without being aligned
      with hugepage boundary.
      
      This is a regression introduced in commit 40716e29 ("hugetlbfs: fix
      alignment of huge page requests"), where alignment code is pushed into
      hugetlb_file_setup() and the variable len in caller side is not changed.
      
      To fix this, this patch partially reverts that commit, and adds
      alignment code in caller side.  And it also introduces hstate_sizelog()
      in order to get proper hstate to specified hugepage size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
      
      [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: <iceman_dvd@yahoo.com>
      Cc: Steven Truelove <steven.truelove@utoronto.ca>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af73e4d9
  2. 07 5月, 2013 2 次提交
  3. 05 5月, 2013 20 次提交
  4. 02 5月, 2013 8 次提交
    • A
      ceph: use ceph_create_snap_context() · 812164f8
      Alex Elder 提交于
      Now that we have a library routine to create snap contexts, use it.
      
      This is part of:
          http://tracker.ceph.com/issues/4857Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      812164f8
    • A
      libceph: kill off osd data write_request parameters · 406e2c9f
      Alex Elder 提交于
      In the incremental move toward supporting distinct data items in an
      osd request some of the functions had "write_request" parameters to
      indicate, basically, whether the data belonged to in_data or the
      out_data.  Now that we maintain the data fields in the op structure
      there is no need to indicate the direction, so get rid of the
      "write_request" parameters.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      406e2c9f
    • R
      ceph: fix printk format warnings in file.c · ac7f29bf
      Randy Dunlap 提交于
      Fix printk format warnings by using %zd for 'ssize_t' variables:
      
      fs/ceph/file.c:751:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]
      fs/ceph/file.c:762:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc:	ceph-devel@vger.kernel.org
      Signed-off-by: NSage Weil <sage@inktank.com>
      ac7f29bf
    • Y
      ceph: fix race between writepages and truncate · 1ac0fc8a
      Yan, Zheng 提交于
      ceph_writepages_start() reads inode->i_size in two places. It can get
      different values between successive read, because truncate can change
      inode->i_size at any time. The race can lead to mismatch between data
      length of osd request and pages marked as writeback. When osd request
      finishes, it clear writeback page according to its data length. So
      some pages can be left in writeback state forever. The fix is only
      read inode->i_size once, save its value to a local variable and use
      the local variable when i_size is needed.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      1ac0fc8a
    • Y
      ceph: apply write checks in ceph_aio_write · 03d254ed
      Yan, Zheng 提交于
      copy write checks in __generic_file_aio_write to ceph_aio_write.
      To make these checks cover sync write path.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      03d254ed
    • Y
      ceph: take i_mutex before getting Fw cap · 37505d57
      Yan, Zheng 提交于
      There is deadlock as illustrated bellow. The fix is taking i_mutex
      before getting Fw cap reference.
      
            write                    truncate                 MDS
      ---------------------     --------------------      --------------
      get Fw cap
                                lock i_mutex
      lock i_mutex (blocked)
                                request setattr.size  ->
                                                      <-   revoke Fw cap
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      37505d57
    • A
      libceph: change how "safe" callback is used · 26be8808
      Alex Elder 提交于
      An osd request currently has two callbacks.  They inform the
      initiator of the request when we've received confirmation for the
      target osd that a request was received, and when the osd indicates
      all changes described by the request are durable.
      
      The only time the second callback is used is in the ceph file system
      for a synchronous write.  There's a race that makes some handling of
      this case unsafe.  This patch addresses this problem.  The error
      handling for this callback is also kind of gross, and this patch
      changes that as well.
      
      In ceph_sync_write(), if a safe callback is requested we want to add
      the request on the ceph inode's unsafe items list.  Because items on
      this list must have their tid set (by ceph_osd_start_request()), the
      request added *after* the call to that function returns.  The
      problem with this is that there's a race between starting the
      request and adding it to the unsafe items list; the request may
      already be complete before ceph_sync_write() even begins to put it
      on the list.
      
      To address this, we change the way the "safe" callback is used.
      Rather than just calling it when the request is "safe", we use it to
      notify the initiator the bounds (start and end) of the period during
      which the request is *unsafe*.  So the initiator gets notified just
      before the request gets sent to the osd (when it is "unsafe"), and
      again when it's known the results are durable (it's no longer
      unsafe).  The first call will get made in __send_request(), just
      before the request message gets sent to the messenger for the first
      time.  That function is only called by __send_queued(), which is
      always called with the osd client's request mutex held.
      
      We then have this callback function insert the request on the ceph
      inode's unsafe list when we're told the request is unsafe.  This
      will avoid the race because this call will be made under protection
      of the osd client's request mutex.  It also nicely groups the setup
      and cleanup of the state associated with managing unsafe requests.
      
      The name of the "safe" callback field is changed to "unsafe" to
      better reflect its new purpose.  It has a Boolean "unsafe" parameter
      to indicate whether the request is becoming unsafe or is now safe.
      Because the "msg" parameter wasn't used, we drop that.
      
      This resolves the original problem reportedin:
          http://tracker.ceph.com/issues/4706Reported-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      26be8808
    • A
      ceph: let osd client clean up for interrupted request · 7d7d51ce
      Alex Elder 提交于
      In ceph_sync_write(), if a safe callback is supplied with a request,
      and an error is returned by ceph_osdc_wait_request(), a block of
      code is executed to remove the request from the unsafe writes list
      and drop references to capabilities acquired just prior to a call to
      ceph_osdc_wait_request().
      
      The only function used for this callback is sync_write_commit(),
      and it does *exactly* what that block of error handling code does.
      
      Now in ceph_osdc_wait_request(), if an error occurs (due to an
      interupt during a wait_for_completion_interruptible() call),
      complete_request() gets called, and that calls the request's
      safe_callback method if it's defined.
      
      So this means that this cleanup activity gets called twice in this
      case, which is erroneous (and in fact leads to a crash).
      
      Fix this by just letting the osd client handle the cleanup in
      the event of an interrupt.
      
      This resolves one problem mentioned in:
          http://tracker.ceph.com/issues/4706Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>
      7d7d51ce