1. 25 8月, 2016 9 次提交
  2. 09 8月, 2016 2 次提交
  3. 28 7月, 2016 2 次提交
  4. 08 6月, 2016 1 次提交
  5. 26 5月, 2016 10 次提交
    • I
      libceph: replace ceph_monc_request_next_osdmap() · 7cca78c9
      Ilya Dryomov 提交于
      ... with a wrapper around maybe_request_map() - no need for two
      osdmap-specific functions.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      7cca78c9
    • I
      libceph: async MON client generic requests · d0b19705
      Ilya Dryomov 提交于
      For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
      messages asynchronously and get a callback on completion.  Refactor MON
      client to allow firing off generic requests asynchronously and add an
      async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
      switched over and remains sync.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d0b19705
    • I
      libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61
      Ilya Dryomov 提交于
      This adds support and switches rbd to a new, more reliable version of
      watch/notify protocol.  As with the OSD client update, this is mostly
      about getting the right structures linked into the right places so that
      reconnects are properly sent when needed.  watch/notify v2 also
      requires sending regular pings to the OSDs - send_linger_ping().
      
      A major change from the old watch/notify implementation is the
      introduction of ceph_osd_linger_request - linger requests no longer
      piggy back on ceph_osd_request.  ceph_osd_event has been merged into
      ceph_osd_linger_request.
      
      All the details are now hidden within libceph, the interface consists
      of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
      ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
      the lifetime management simple.
      
      ceph_osdc_notify_ack() accepts an optional data payload, which is
      relayed back to the notifier.
      
      Portions of this patch are loosely based on work by Douglas Fuller
      <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      922dab61
    • I
      rbd: rbd_dev_header_unwatch_sync() variant · c525f036
      Ilya Dryomov 提交于
      Introduce __rbd_dev_header_unwatch_sync(), which doesn't flush notify
      callbacks.  This is for the new rados_watcherrcb_t, which would be
      called from a notify callback.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c525f036
    • I
      libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe
      Ilya Dryomov 提交于
      finish_read(), its only user, uses it to get to hdr.data_len, which is
      what ->r_result is set to on success.  This gains us the ability to
      safely call callbacks from contexts other than reply, e.g. map check.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      85e084fe
    • I
      libceph: switch to calc_target(), part 2 · bb873b53
      Ilya Dryomov 提交于
      The crux of this is getting rid of ceph_osdc_build_request(), so that
      MOSDOp can be encoded not before but after calc_target() calculates the
      actual target.  Encoding now happens within ceph_osdc_start_request().
      
      Also nuked is the accompanying bunch of pointers into the encoded
      buffer that was used to update fields on each send - instead, the
      entire front is re-encoded.  If we want to support target->name_len !=
      base->name_len in the future, there is no other way, because oid is
      surrounded by other fields in the encoded buffer.
      
      Encoding OSD ops and adding data items to the request message were
      mixed together in osd_req_encode_op().  While we want to re-encode OSD
      ops, we don't want to add duplicate data items to the message when
      resending, so all call to ceph_osdc_msg_data_add() are factored out
      into a new setup_request_data().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      bb873b53
    • I
      rbd: use header_oid instead of header_name · c41d13a3
      Ilya Dryomov 提交于
      Switch to ceph_object_id and use ceph_oid_aprintf() instead of a bare
      const char *.  This reduces noise in rbd_dev_header_name().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c41d13a3
    • I
      libceph: variable-sized ceph_object_id · d30291b9
      Ilya Dryomov 提交于
      Currently ceph_object_id can hold object names of up to 100
      (CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
      expect one - long rbd image names:
      
      - a format 1 header is named "<imgname>.rbd"
      - an object that points to a format 2 header is named "rbd_id.<imgname>"
      
      We operate on these potentially long-named objects during rbd map, and,
      for format 1 images, during header refresh.  (A format 2 header name is
      a small system-generated string.)
      
      Lift this 100 character limit by making ceph_object_id be able to point
      to an externally-allocated string.  Apart from being able to work with
      almost arbitrarily-long named objects, this allows us to reduce the
      size of ceph_object_id from >100 bytes to 64 bytes.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d30291b9
    • I
      libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16
      Ilya Dryomov 提交于
      The size of ->r_request and ->r_reply messages depends on the size of
      the object name (ceph_object_id), while the size of ceph_osd_request is
      fixed.  Move message allocation into a separate function that would
      have to be called after ceph_object_id and ceph_object_locator (which
      is also going to become variable in size with RADOS namespaces) have
      been filled in:
      
          req = ceph_osdc_alloc_request(...);
          <fill in req->r_base_oid>
          <fill in req->r_base_oloc>
          ceph_osdc_alloc_messages(req);
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      13d1ad16
    • I
      rbd: get/put img_request in rbd_img_request_submit() · 663ae2cc
      Ilya Dryomov 提交于
      By the time we get to checking for_each_obj_request_safe(img_request)
      terminating condition, all obj_requests may be complete and img_request
      ref, that rbd_img_request_submit() takes away from its caller, may be
      put.  Moving the next_obj_request cursor is then a use-after-free on
      img_request.
      
      It's totally benign, as the value that's read is never used, but
      I think it's still worth fixing.
      
      Cc: Alex Elder <elder@linaro.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      663ae2cc
  6. 28 4月, 2016 2 次提交
    • I
      rbd: report unsupported features to syslog · d3767f0f
      Ilya Dryomov 提交于
      ... instead of just returning an error.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      d3767f0f
    • I
      rbd: fix rbd map vs notify races · 811c6688
      Ilya Dryomov 提交于
      A while ago, commit 9875201e ("rbd: fix use-after free of
      rbd_dev->disk") fixed rbd unmap vs notify race by introducing
      an exported wrapper for flushing notifies and sticking it into
      do_rbd_remove().
      
      A similar problem exists on the rbd map path, though: the watch is
      registered in rbd_dev_image_probe(), while the disk is set up quite
      a few steps later, in rbd_dev_device_setup().  Nothing prevents
      a notify from coming in and crashing on a NULL rbd_dev->disk:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
          Call Trace:
           [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
           [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
           [<ffffffff8109d5db>] process_one_work+0x17b/0x470
           [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
           [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
           [<ffffffff810a5acf>] kthread+0xcf/0xe0
           [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
           [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
           [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
           [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
          RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
      
      If an error occurs during rbd map, we have to error out, potentially
      tearing down a watch.  Just like on rbd unmap, notifies have to be
      flushed, otherwise rbd_watch_cb() may end up trying to read in the
      image header after rbd_dev_image_release() has run:
      
          Assertion failure in rbd_dev_header_info() at line 4722:
      
           rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
      
          Call Trace:
           [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
           [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
           [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
           [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
           [<ffffffff81107799>] process_one_work+0x689/0x1a80
           [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
           [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
           [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
           [<ffffffff81108c69>] worker_thread+0xd9/0x1320
           [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
           [<ffffffff8111b02d>] kthread+0x21d/0x2e0
           [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
           [<ffffffff82022802>] ret_from_fork+0x22/0x40
           [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
          RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30
      
      To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
      revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
      rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
      header read-in into a critical section.  The latter also happens to
      take care of rbd map foo@bar vs rbd snap rm foo@bar race.
      
      Fixes: http://tracker.ceph.com/issues/15490Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      811c6688
  7. 06 4月, 2016 1 次提交
  8. 26 3月, 2016 3 次提交
  9. 22 1月, 2016 1 次提交
  10. 04 12月, 2015 1 次提交
  11. 03 11月, 2015 5 次提交
  12. 31 10月, 2015 1 次提交
    • R
      rbd: require stable pages if message data CRCs are enabled · bae818ee
      Ronny Hegewald 提交于
      rbd requires stable pages, as it performs a crc of the page data before
      they are send to the OSDs.
      
      But since kernel 3.9 (patch 1d1d1a76
      "mm: only enforce stable page writes if the backing device requires
      it") it is not assumed anymore that block devices require stable pages.
      
      This patch sets the necessary flag to get stable pages back for rbd.
      
      In a ceph installation that provides multiple ext4 formatted rbd
      devices "bad crc" messages appeared regularly (ca 1 message every 1-2
      minutes on every OSD that provided the data for the rbd) in the
      OSD-logs before this patch. After this patch this messages are pretty
      much gone (only ca 1-2 / month / OSD).
      
      Cc: stable@vger.kernel.org # 3.9+, needs backporting
      Signed-off-by: NRonny Hegewald <Ronny.Hegewald@online.de>
      [idryomov@gmail.com: require stable pages only in crc case, changelog]
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      bae818ee
  13. 24 10月, 2015 2 次提交
    • I
      rbd: prevent kernel stack blow up on rbd map · 6d69bb53
      Ilya Dryomov 提交于
      Mapping an image with a long parent chain (e.g. image foo, whose parent
      is bar, whose parent is baz, etc) currently leads to a kernel stack
      overflow, due to the following recursion in the reply path:
      
        rbd_osd_req_callback()
          rbd_obj_request_complete()
            rbd_img_obj_callback()
              rbd_img_parent_read_callback()
                rbd_obj_request_complete()
                  ...
      
      Limit the parent chain to 16 images, which is ~5K worth of stack.  When
      the above recursion is eliminated, this limit can be lifted.
      
      Fixes: http://tracker.ceph.com/issues/12538
      
      Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      6d69bb53
    • I
      rbd: don't leak parent_spec in rbd_dev_probe_parent() · 1f2c6651
      Ilya Dryomov 提交于
      Currently we leak parent_spec and trigger a "parent reference
      underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
      The problem is we take the !parent out_err branch and that only drops
      refcounts; parent_spec that would've been freed had we called
      rbd_dev_unparent() remains and triggers rbd_warn() in
      rbd_dev_parent_put() - at that point we have parent_spec != NULL and
      parent_ref == 0, so counter ends up being -1 after the decrement.
      
      Redo rbd_dev_probe_parent() to fix this.
      
      Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      1f2c6651