1. 22 7月, 2016 1 次提交
    • I
      libceph: apply new_state before new_up_client on incrementals · 930c5328
      Ilya Dryomov 提交于
      Currently, osd_weight and osd_state fields are updated in the encoding
      order.  This is wrong, because an incremental map may look like e.g.
      
          new_up_client: { osd=6, addr=... } # set osd_state and addr
          new_state: { osd=6, xorstate=EXISTS } # clear osd_state
      
      Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
      applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
      on with the new_state update, we flip EXISTS and leave osd6 in a weird
      "!EXISTS but UP" state.  A non-existent OSD is considered down by the
      mapping code
      
      2087    for (i = 0; i < pg->pg_temp.len; i++) {
      2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
      2089                    if (ceph_can_shift_osds(pi))
      2090                            continue;
      2091
      2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;
      
      and so requests get directed to the second OSD in the set instead of
      the first, resulting in OSD-side errors like:
      
      [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
      
      and hung rbds on the client:
      
      [  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
      [  493.566805] rbd: rbd0:   result -6 xferred 400000
      [  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
      
      The fix is to decouple application from the decoding and:
      - apply new_weight first
      - apply new_state before new_up_client
      - twiddle osd_state flags if marking in
      - clear out some of the state if osd is destroyed
      
      Fixes: http://tracker.ceph.com/issues/14901
      
      Cc: stable@vger.kernel.org # 3.15+: 6dd74e44: libceph: set 'exists' flag for newly up osd
      Cc: stable@vger.kernel.org # 3.15+
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      930c5328
  2. 31 5月, 2016 1 次提交
    • I
      libceph: use %s instead of %pE in dout()s · 4a3262b1
      Ilya Dryomov 提交于
      Commit d30291b9 ("libceph: variable-sized ceph_object_id") changed
      dout()s in what is now encode_request() and ceph_object_locator_to_pg()
      to use %pE, mostly to document that, although all rbd and cephfs object
      names are NULL-terminated strings, ceph_object_id will handle any RADOS
      object name, including the one containing NULs, just fine.
      
      However, it turns out that vbin_printf() can't handle anything but ints
      and %s - all %p suffixes are ignored.  The buffer %p** points to isn't
      recorded, resulting in trash in the messages if the buffer had been
      reused by the time bstr_printf() got to it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      4a3262b1
  3. 26 5月, 2016 9 次提交
  4. 05 2月, 2016 1 次提交
  5. 09 9月, 2015 1 次提交
  6. 01 7月, 2015 1 次提交
  7. 22 4月, 2015 1 次提交
    • I
      crush: straw2 bucket type with an efficient 64-bit crush_ln() · 958a2765
      Ilya Dryomov 提交于
      This is an improved straw bucket that correctly avoids any data movement
      between items A and B when neither A nor B's weights are changed.  Said
      differently, if we adjust the weight of item C (including adding it anew
      or removing it completely), we will only see inputs move to or from C,
      never between other items in the bucket.
      
      Notably, there is not intermediate scaling factor that needs to be
      calculated.  The mapping function is a simple function of the item weights.
      
      The below commits were squashed together into this one (mostly to avoid
      adding and then yanking a ~6000 lines worth of crush_ln_table):
      
      - crush: add a straw2 bucket type
      - crush: add crush_ln to calculate nature log efficently
      - crush: improve straw2 adjustment slightly
      - crush: change crush_ln to provide 32 more digits
      - crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
      - crush/mapper: fix divide-by-0 in straw2
        (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
         to create a proper compat.h in ceph.git)
      
      Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                                32a1ead92efcd351822d22a5fc37d159c65c1338,
                                6289912418c4a3597a11778bcf29ed5415117ad9,
                                35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                                6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                                b5921d55d16796e12d66ad2c4add7305f9ce2353.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      958a2765
  8. 15 10月, 2014 2 次提交
  9. 17 5月, 2014 1 次提交
  10. 29 4月, 2014 1 次提交
    • I
      libceph: fix non-default values check in apply_primary_affinity() · 92b2e751
      Ilya Dryomov 提交于
      osd_primary_affinity array is indexed into incorrectly when checking
      for non-default primary-affinity values.  This nullifies the impact of
      the rest of the apply_primary_affinity() and results in misdirected
      requests.
      
                      if (osds[i] != CRUSH_ITEM_NONE &&
                          osdmap->osd_primary_affinity[i] !=
                                                      ^^^
                                              CEPH_OSD_DEFAULT_PRIMARY_AFFINITY) {
      
      For a pool with size 2, this always ends up checking osd0 and osd1
      primary_affinity values, instead of the values that correspond to the
      osds in question.  E.g., given a [2,3] up set and a [max,max,0,max]
      primary affinity vector, requests are still sent to osd2, because both
      osd0 and osd1 happen to have max primary_affinity values and therefore
      we return from apply_primary_affinity() early on the premise that all
      osds in the given set have max (default) values.  Fix it.
      
      Fixes: http://tracker.ceph.com/issues/7954Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      92b2e751
  11. 05 4月, 2014 21 次提交