1. 03 4月, 2014 4 次提交
  2. 30 3月, 2014 1 次提交
    • A
      rbd: drop an unsafe assertion · 638c323c
      Alex Elder 提交于
      Olivier Bonvalet reported having repeated crashes due to a failed
      assertion he was hitting in rbd_img_obj_callback():
      
          Assertion failure in rbd_img_obj_callback() at line 2165:
      	rbd_assert(which >= img_request->next_completion);
      
      With a lot of help from Olivier with reproducing the problem
      we were able to determine the object and image requests had
      already been completed (and often freed) at the point the
      assertion failed.
      
      There was a great deal of discussion on the ceph-devel mailing list
      about this.  The problem only arose when there were two (or more)
      object requests in an image request, and the problem was always
      seen when the second request was being completed.
      
      The problem is due to a race in the window between setting the
      "done" flag on an object request and checking the image request's
      next completion value.  When the first object request completes, it
      checks to see if its successor request is marked "done", and if
      so, that request is also completed.  In the process, the image
      request's next_completion value is updated to reflect that both
      the first and second requests are completed.  By the time the
      second request is able to check the next_completion value, it
      has been set to a value *greater* than its own "which" value,
      which caused an assertion to fail.
      
      Fix this problem by skipping over any completion processing
      unless the completing object request is the next one expected.
      Test only for inequality (not >=), and eliminate the bad
      assertion.
      Tested-by: NOlivier Bonvalet <ob@daevel.fr>
      Signed-off-by: NAlex Elder <elder@linaro.org>
      Reviewed-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      638c323c
  3. 28 1月, 2014 4 次提交
  4. 01 1月, 2014 10 次提交
  5. 24 11月, 2013 3 次提交
    • K
      rbd: Refactor bio cloning · 5341a627
      Kent Overstreet 提交于
      Now that we've got drivers converted to the new immutable bvec
      primitives, bio splitting becomes much easier - this is how the new
      bio_split() will work. (Someone more familiar with the ceph code could
      probably use bio_clone_fast() instead of bio_clone() here).
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      5341a627
    • K
      block: Convert bio_for_each_segment() to bvec_iter · 7988613b
      Kent Overstreet 提交于
      More prep work for immutable biovecs - with immutable bvecs drivers
      won't be able to use the biovec directly, they'll need to use helpers
      that take into account bio->bi_iter.bi_bvec_done.
      
      This updates callers for the new usage without changing the
      implementation yet.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: support@lsi.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      Cc: cbe-oss-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: DL-MPTFusionLinux@lsi.com
      Cc: linux-scsi@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: cluster-devel@redhat.com
      Cc: linux-mm@kvack.org
      Acked-by: NGeoff Levand <geoff@infradead.org>
      7988613b
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  6. 12 9月, 2013 1 次提交
  7. 10 9月, 2013 5 次提交
    • J
      rbd: fix error handling from rbd_snap_name() · da6a6b63
      Josh Durgin 提交于
      rbd_snap_name() calls rbd_dev_v{1,2}_snap_name() depending on the
      format of the image. The format 1 version returns NULL on error, which
      is handled by the caller. The format 2 version returns an ERR_PTR,
      which the caller of rbd_snap_name() does not expect.
      
      Fortunately this is unlikely to occur in practice because
      rbd_snap_id_by_name() is called before rbd_snap_name(). This would hit
      similar errors to rbd_snap_name() (like the snapshot not existing) and
      return early, so rbd_snap_name() would not hit an error unless the
      snapshot was removed between the two calls or memory was exhausted.
      
      Use an ERR_PTR in rbd_dev_v1_snap_name() so that the specific error
      can be propagated, and it is consistent with rbd_dev_v2_snap_name().
      Handle the ERR_PTR in the only rbd_snap_name() caller.
      Suggested-by: NAlex Elder <alex.elder@linaro.org>
      Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      da6a6b63
    • J
      rbd: ignore unmapped snapshots that no longer exist · efadc98a
      Josh Durgin 提交于
      This prevents erroring out while adding a device when a snapshot
      unrelated to the current mapping is deleted between reading the
      snapshot context and reading the snapshot names. If the mapped
      snapshot name is not found an error still occurs as usual.
      Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      efadc98a
    • J
      rbd: fix use-after free of rbd_dev->disk · 9875201e
      Josh Durgin 提交于
      Removing a device deallocates the disk, unschedules the watch, and
      finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
      from the watch callback, updates the disk size and rbd_dev
      structure. With no locking between them, rbd_dev_refresh() may use the
      device or rbd_dev after they've been freed.
      
      To fix this, check whether RBD_DEV_FLAG_REMOVING is set before
      updating the disk size in rbd_dev_refresh(). In order to prevent a
      race where rbd_dev_refresh() is already revalidating the disk when
      rbd_remove() is called, move the call to rbd_bus_del_dev() after the
      watch is unregistered and all notifies are complete. It's safe to
      defer deleting this structure because no new requests can be submitted
      once the RBD_DEV_FLAG_REMOVING is set, since the device cannot be
      opened.
      
      Fixes: http://tracker.ceph.com/issues/5636Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      9875201e
    • J
      rbd: make rbd_obj_notify_ack() synchronous · 20e0af67
      Josh Durgin 提交于
      The only user of rbd_obj_notify_ack() is rbd_watch_cb(). It used
      asynchronously with no tracking of when the notify ack completes, so
      it may still be in progress when the osd_client is shut down.  This
      results in a BUG() since the osd client assumes no requests are in
      flight when it stops. Since all notifies are flushed before the
      osd_client is stopped, waiting for the notify ack to complete before
      returning from the watch callback ensures there are no notify acks in
      flight during shutdown.
      
      Rename rbd_obj_notify_ack() to rbd_obj_notify_ack_sync() to reflect
      its new synchronous nature.
      Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      20e0af67
    • J
      rbd: complete notifies before cleaning up osd_client and rbd_dev · 9abc5990
      Josh Durgin 提交于
      To ensure rbd_dev is not used after it's released, flush all pending
      notify callbacks before calling rbd_dev_image_release(). No new
      notifies can be added to the queue at this point because the watch has
      already be unregistered with the osd_client.
      Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      9abc5990
  8. 04 9月, 2013 3 次提交
  9. 28 8月, 2013 1 次提交
  10. 10 8月, 2013 1 次提交
  11. 04 7月, 2013 7 次提交
    • S
      rbd: fix a couple warnings · e976cad0
      Sage Weil 提交于
      gcc isn't quite smart enough and generates these warnings:
      
      drivers/block/rbd.c: In function 'rbd_img_request_fill':
      drivers/block/rbd.c:1266:22: warning: 'bio_list' may be used uninitialized in this function [-Wmaybe-uninitialized]
      drivers/block/rbd.c:2186:14: note: 'bio_list' was declared here
      drivers/block/rbd.c:2247:10: warning: 'pages' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      even though they are initialized for their respective code paths.
      Signed-off-by: NSage Weil <sage@inktank.com>
      e976cad0
    • A
      rbd: take a little credit · d552c619
      Alex Elder 提交于
      Add a name to the list of authors.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      d552c619
    • A
      rbd: use rwsem to protect header updates · cfbf6377
      Alex Elder 提交于
      Updating an image header needs to be protected to ensure it's
      done consistently.  However distinct headers can be updated
      concurrently without a problem.  Instead of using the global
      control lock to serialize headder updates, just rely on the header
      semaphore.  (It's already used, this just moves it out to cover
      a broader section of the code.)
      
      That leaves the control mutex protecting only the creation of rbd
      clients, so rename it.
      
      This resolves:
          http://tracker.ceph.com/issues/5222Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      cfbf6377
    • A
      rbd: don't hold ctl_mutex to get/put device · 1ba0f1e7
      Alex Elder 提交于
      When an rbd device is first getting mapped, its device registration
      is protected the control mutex.  There is no need to do that though,
      because the device has already been assigned an id that's guaranteed
      to be unique.
      
      An unmap of an rbd device won't proceed if the device has a non-zero
      open count or is already being unmapped.  So there's no need to hold
      the control mutex in that case either.
      
      Finally, an rbd device can't be opened if it is being removed, and
      it won't go away if there is a non-zero open count.  So here too
      there's no need to hold the control mutex while getting or putting a
      reference to an rbd device's Linux device structure.
      
      Drop the mutex calls in these cases.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      1ba0f1e7
    • A
      rbd: protect against concurrent unmaps · 82a442d2
      Alex Elder 提交于
      Make sure two concurrent unmap operations on the same rbd device
      won't collide, by only proceeding with the removal and cleanup of a
      device if is not already underway.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      82a442d2
    • A
      rbd: set removing flag while holding list lock · 751cc0e3
      Alex Elder 提交于
      When unmapping a device, its id is supplied, and that is used to
      look up which rbd device should be unmapped.  Looking up the
      device involves searching the rbd device list while holding
      a spinlock that protects access to that list.
      
      Currently all of this is done under protection of the control lock,
      but that protection is going away soon.  To ensure the rbd_dev is
      still valid (still on the list) while setting its REMOVING flag, do
      so while still holding the list lock.  To do so, get rid of
      __rbd_get_dev(), and open code what it did in the one place it
      was used.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      751cc0e3
    • A
      rbd: protect against duplicate client creation · 08f75463
      Alex Elder 提交于
      If more than one rbd image has the same ceph cluster configuration
      (same options, same set of monitors, same keys) they normally share
      a single rbd client.
      
      When an image is getting mapped, rbd looks to see if an existing
      client can be used, and creates a new one if not.
      
      The lookup and creation are not done under a common lock though, so
      mapping two images concurrently could lead to duplicate clients
      getting set up needlessly.  This isn't a major problem, but it's
      wasteful and different from what's intended.
      
      This patch fixes that by using the control mutex to protect
      both the lookup and (if needed) creation of the client.  It
      was previously used just when creating.
      
      This resolves:
          http://tracker.ceph.com/issues/3094Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      08f75463