1. 26 5月, 2016 34 次提交
    • I
      libceph: take osdc->lock in osdmap_show() and dump flags in hex · b4f34795
      Ilya Dryomov 提交于
      There is now about a dozen CEPH_OSDMAP_* flags.  This is a debugging
      interface, so just dump in hex instead of spelling each flag out.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      b4f34795
    • I
      libceph: pool deletion detection · 4609245e
      Ilya Dryomov 提交于
      This adds the "map check" infrastructure for sending osdmap version
      checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
      -ENOENT if the target pool doesn't exist or has just been deleted.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      4609245e
    • I
      libceph: async MON client generic requests · d0b19705
      Ilya Dryomov 提交于
      For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
      messages asynchronously and get a callback on completion.  Refactor MON
      client to allow firing off generic requests asynchronously and add an
      async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
      switched over and remains sync.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d0b19705
    • I
      libceph: support for checking on status of watch · b07d3c4b
      Ilya Dryomov 提交于
      Implement ceph_osdc_watch_check() to be able to check on status of
      watch.  Note that the time it takes for a watch/notify event to get
      delivered through the notify_wq is taken into account.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      b07d3c4b
    • I
      libceph: support for sending notifies · 19079203
      Ilya Dryomov 提交于
      Implement ceph_osdc_notify() for sending notifies.
      
      Due to the fact that the current messenger can't do read-in into
      pagelists (it can only do write-out from them), I had to go with a page
      vector for a NOTIFY_COMPLETE payload, for now.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      19079203
    • I
      libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61
      Ilya Dryomov 提交于
      This adds support and switches rbd to a new, more reliable version of
      watch/notify protocol.  As with the OSD client update, this is mostly
      about getting the right structures linked into the right places so that
      reconnects are properly sent when needed.  watch/notify v2 also
      requires sending regular pings to the OSDs - send_linger_ping().
      
      A major change from the old watch/notify implementation is the
      introduction of ceph_osd_linger_request - linger requests no longer
      piggy back on ceph_osd_request.  ceph_osd_event has been merged into
      ceph_osd_linger_request.
      
      All the details are now hidden within libceph, the interface consists
      of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
      ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
      the lifetime management simple.
      
      ceph_osdc_notify_ack() accepts an optional data payload, which is
      relayed back to the notifier.
      
      Portions of this patch are loosely based on work by Douglas Fuller
      <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      922dab61
    • I
      libceph: wait_request_timeout() · 42b06965
      Ilya Dryomov 提交于
      The unwatch timeout is currently implemented in rbd.  With
      watch/unwatch code moving into libceph, we are going to need
      a ceph_osdc_wait_request() variant with a timeout.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      42b06965
    • I
      libceph: request_init() and request_release_checks() · 3540bfdb
      Ilya Dryomov 提交于
      These are going to be used by request_reinit() code.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      3540bfdb
    • I
      libceph: a major OSD client update · 5aea3dcd
      Ilya Dryomov 提交于
      This is a major sync up, up to ~Jewel.  The highlights are:
      
      - per-session request trees (vs a global per-client tree)
      - per-session locking (vs a global per-client rwlock)
      - homeless OSD session
      - no ad-hoc global per-client lists
      - support for pool quotas
      - foundation for watch/notify v2 support
      - foundation for map check (pool deletion detection) support
      
      The switchover is incomplete: lingering requests can be setup and
      teared down but aren't ever reestablished.  This functionality is
      restored with the introduction of the new lingering infrastructure
      (ceph_osd_linger_request, linger_work, etc) in a later commit.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      5aea3dcd
    • I
      libceph: protect osdc->osd_lru list with a spinlock · 9dd2845c
      Ilya Dryomov 提交于
      OSD client is getting moved from the big per-client lock to a set of
      per-session locks.  The big rwlock would only be held for read most of
      the time, so a global osdc->osd_lru needs additional protection.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      9dd2845c
    • I
      libceph: allocate ceph_osd with GFP_NOFAIL · 7a28f59b
      Ilya Dryomov 提交于
      create_osd() is called way too deep in the stack to be able to error
      out in a sane way; a failing create_osd() just messes everything up.
      The current req_notarget list solution is broken - the list is never
      traversed as it's not entirely clear when to do it, I guess.
      
      If we were to start traversing it at regular intervals and retrying
      each request, we wouldn't be far off from what __GFP_NOFAIL is doing,
      so allocate OSD sessions with __GFP_NOFAIL, at least until we come up
      with a better fix.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      7a28f59b
    • I
      libceph: osd_init() and osd_cleanup() · 0247a0cf
      Ilya Dryomov 提交于
      These are going to be used by homeless OSD sessions code.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      0247a0cf
    • I
      libceph: handle_one_map() · 42c1b124
      Ilya Dryomov 提交于
      Separate osdmap handling from decoding and iterating over a bag of maps
      in a fresh MOSDMap message.  This sets up the scene for the updated OSD
      client.
      
      Of particular importance here is the addition of pi->was_full, which
      can be used to answer "did this pool go full -> not-full in this map?".
      This is the key bit for supporting pool quotas.
      
      We won't be able to downgrade map_sem for much longer, so drop
      downgrade_write().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      42c1b124
    • I
      libceph: allocate dummy osdmap in ceph_osdc_init() · e5253a7b
      Ilya Dryomov 提交于
      This leads to a simpler osdmap handling code, particularly when dealing
      with pi->was_full, which is introduced in a later commit.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      e5253a7b
    • I
      libceph: schedule tick from ceph_osdc_init() · fbca9635
      Ilya Dryomov 提交于
      Both homeless OSD sessions and watch/notify v2, introduced in later
      commits, require periodic ticks which don't depend on ->num_requests.
      Schedule the initial tick from ceph_osdc_init() and reschedule from
      handle_timeout() unconditionally.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fbca9635
    • I
      libceph: move schedule_delayed_work() in ceph_osdc_init() · b37ee1b9
      Ilya Dryomov 提交于
      ceph_osdc_stop() isn't called if ceph_osdc_init() fails, so we end up
      with handle_osds_timeout() running on invalid memory if any one of the
      allocations fails.  Call schedule_delayed_work() after everything is
      setup, just before returning.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      b37ee1b9
    • I
      libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e
      Ilya Dryomov 提交于
      If you specify ACK | ONDISK and set ->r_unsafe_callback, both
      ->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
      very confusing.  Redo this so that only one of them is called:
      
          ->r_unsafe_callback(true), on ack
          ->r_unsafe_callback(false), on commit
      
      or
      
          ->r_callback, on ack|commit
      
      Decode everything in decode_MOSDOpReply() to reduce clutter.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fe5da05e
    • I
      libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe
      Ilya Dryomov 提交于
      finish_read(), its only user, uses it to get to hdr.data_len, which is
      what ->r_result is set to on success.  This gains us the ability to
      safely call callbacks from contexts other than reply, e.g. map check.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      85e084fe
    • I
      libceph: switch to calc_target(), part 2 · bb873b53
      Ilya Dryomov 提交于
      The crux of this is getting rid of ceph_osdc_build_request(), so that
      MOSDOp can be encoded not before but after calc_target() calculates the
      actual target.  Encoding now happens within ceph_osdc_start_request().
      
      Also nuked is the accompanying bunch of pointers into the encoded
      buffer that was used to update fields on each send - instead, the
      entire front is re-encoded.  If we want to support target->name_len !=
      base->name_len in the future, there is no other way, because oid is
      surrounded by other fields in the encoded buffer.
      
      Encoding OSD ops and adding data items to the request message were
      mixed together in osd_req_encode_op().  While we want to re-encode OSD
      ops, we don't want to add duplicate data items to the message when
      resending, so all call to ceph_osdc_msg_data_add() are factored out
      into a new setup_request_data().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      bb873b53
    • I
      libceph: switch to calc_target(), part 1 · a66dd383
      Ilya Dryomov 提交于
      Replace __calc_request_pg() and most of __map_request() with
      calc_target() and start using req->r_t.
      
      ceph_osdc_build_request() however still encodes base_oid, because it's
      called before calc_target() is and target_oid is empty at that point in
      time; a printf in osdc_show() also shows base_oid.  This is fixed in
      "libceph: switch to calc_target(), part 2".
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a66dd383
    • I
      libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1
      Ilya Dryomov 提交于
      Introduce ceph_osd_request_target, containing all mapping-related
      fields of ceph_osd_request and calc_target() for calculating mappings
      and populating it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      63244fa1
    • I
      libceph: pi->min_size, pi->last_force_request_resend · 04812acf
      Ilya Dryomov 提交于
      Add and decode pi->min_size and pi->last_force_request_resend.  These
      are going to be used by calc_target().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      04812acf
    • I
      libceph: make pgid_cmp() global · f984cb76
      Ilya Dryomov 提交于
      calc_target() code is going to need to know how to compare PGs.  Take
      lhs and rhs pgid by const * while at it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      f984cb76
    • I
      libceph: rename ceph_calc_pg_primary() · f81f1633
      Ilya Dryomov 提交于
      Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
      emphasise that it returns acting primary.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      f81f1633
    • I
      libceph: ceph_osds, ceph_pg_to_up_acting_osds() · 6f3bfd45
      Ilya Dryomov 提交于
      Knowning just acting set isn't enough, we need to be able to record up
      set as well to detect interval changes.  This means returning (up[],
      up_len, up_primary, acting[], acting_len, acting_primary) and passing
      it around.  Introduce and switch to ceph_osds to help with that.
      
      Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return
      both up and acting sets from it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      6f3bfd45
    • I
      libceph: rename ceph_oloc_oid_to_pg() · d9591f5e
      Ilya Dryomov 提交于
      Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg().  Emphasise
      that returned is raw PG and return -ENOENT instead of -EIO if the pool
      doesn't exist.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d9591f5e
    • I
      libceph: DEFINE_RB_FUNCS macro · fcd00b68
      Ilya Dryomov 提交于
      Given
      
          struct foo {
              u64 id;
              struct rb_node bar_node;
          };
      
      generate insert_bar(), erase_bar() and lookup_bar() functions with
      
          DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)
      
      The key is assumed to be an integer (u64, int, etc), compared with
      < and >.  nodefld has to be initialized with RB_CLEAR_NODE().
      
      Start using it for MDS, MON and OSD requests and OSD sessions.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fcd00b68
    • I
      libceph: open-code remove_{all,old}_osds() · 42a2c09f
      Ilya Dryomov 提交于
      They are called only once, from ceph_osdc_stop() and
      handle_osds_timeout() respectively.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      42a2c09f
    • I
      libceph: nuke unused fields and functions · 0c0a8de1
      Ilya Dryomov 提交于
      Either unused or useless:
      
          osdmap->mkfs_epoch
          osd->o_marked_for_keepalive
          monc->num_generic_requests
          osdc->map_waiters
          osdc->last_requested_map
          osdc->timeout_tid
      
          osd_req_op_cls_response_data()
      
          osdmap_apply_incremental() @msgr arg
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      0c0a8de1
    • I
      libceph: variable-sized ceph_object_id · d30291b9
      Ilya Dryomov 提交于
      Currently ceph_object_id can hold object names of up to 100
      (CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
      expect one - long rbd image names:
      
      - a format 1 header is named "<imgname>.rbd"
      - an object that points to a format 2 header is named "rbd_id.<imgname>"
      
      We operate on these potentially long-named objects during rbd map, and,
      for format 1 images, during header refresh.  (A format 2 header name is
      a small system-generated string.)
      
      Lift this 100 character limit by making ceph_object_id be able to point
      to an externally-allocated string.  Apart from being able to work with
      almost arbitrarily-long named objects, this allows us to reduce the
      size of ceph_object_id from >100 bytes to 64 bytes.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d30291b9
    • I
      libceph: change how osd_op_reply message size is calculated · 711da55d
      Ilya Dryomov 提交于
      For a message pool message, preallocate a page, just like we do for
      osd_op.  For a normal message, take ceph_object_id into account and
      don't bother subtracting CEPH_OSD_SLAB_OPS ceph_osd_ops.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      711da55d
    • I
      libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16
      Ilya Dryomov 提交于
      The size of ->r_request and ->r_reply messages depends on the size of
      the object name (ceph_object_id), while the size of ceph_osd_request is
      fixed.  Move message allocation into a separate function that would
      have to be called after ceph_object_id and ceph_object_locator (which
      is also going to become variable in size with RADOS namespaces) have
      been filled in:
      
          req = ceph_osdc_alloc_request(...);
          <fill in req->r_base_oid>
          <fill in req->r_base_oloc>
          ceph_osdc_alloc_messages(req);
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      13d1ad16
    • I
      libceph: grab snapc in ceph_osdc_alloc_request() · 84127282
      Ilya Dryomov 提交于
      ceph_osdc_build_request() is going away.  Grab snapc and initialize
      ->r_snapid in ceph_osdc_alloc_request().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      84127282
    • I
      libceph: make ceph_osdc_put_request() accept NULL · 3ed97d63
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      3ed97d63
  2. 15 5月, 2016 2 次提交
  3. 12 5月, 2016 2 次提交
    • J
      gre: do not keep the GRE header around in collect medata mode · e271c7b4
      Jiri Benc 提交于
      For ipgre interface in collect metadata mode, it doesn't make sense for the
      interface to be of ARPHRD_IPGRE type. The outer header of received packets
      is not needed, as all the information from it is present in metadata_dst. We
      already don't set ipgre_header_ops for collect metadata interfaces, which is
      the only consumer of mac_header pointing to the outer IP header.
      
      Just set the interface type to ARPHRD_NONE in collect metadata mode for
      ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset
      mac_header.
      
      Fixes: a64b04d8 ("gre: do not assign header_ops in collect metadata mode")
      Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e271c7b4
    • J
      openvswitch: Fix cached ct with helper. · 16ec3d4f
      Joe Stringer 提交于
      When using conntrack helpers from OVS, a common configuration is to
      perform a lookup without specifying a helper, then go through a
      firewalling policy, only to decide to attach a helper afterwards.
      
      In this case, the initial lookup will cause a ct entry to be attached to
      the skb, then the later commit with helper should attach the helper and
      confirm the connection. However, the helper attachment has been missing.
      If the user has enabled automatic helper attachment, then this issue
      will be masked as it will be applied in init_conntrack(). It is also
      masked if the action is executed from ovs_packet_cmd_execute() as that
      will construct a fresh skb.
      
      This patch fixes the issue by making an explicit call to try to assign
      the helper if there is a discrepancy between the action's helper and the
      current skb->nfct.
      
      Fixes: cae3a262 ("openvswitch: Allow attaching helpers to ct action")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16ec3d4f
  4. 11 5月, 2016 2 次提交