1. 09 9月, 2015 6 次提交
  2. 10 7月, 2015 2 次提交
    • I
      libceph: treat sockaddr_storage with uninitialized family as blank · c44bd69c
      Ilya Dryomov 提交于
      addr_is_blank() should return true if family is neither AF_INET nor
      AF_INET6.  This is what its counterpart entity_addr_t::is_blank_ip() is
      doing and it is the right thing to do: in process_banner() we check if
      our address is blank and if it is "learn" it from our peer.  As it is,
      we never learn our address and always send out a blank one.  This goes
      way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
      some ipv6 support groundwork") from 2009.
      
      While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
      constant instead of 0.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      c44bd69c
    • I
      libceph: enable ceph in a non-default network namespace · 757856d2
      Ilya Dryomov 提交于
      Grab a reference on a network namespace of the 'rbd map' (in case of
      rbd) or 'mount' (in case of ceph) process and use that to open sockets
      instead of always using init_net and bailing if network namespace is
      anything but init_net.  Be careful to not share struct ceph_client
      instances between different namespaces and don't add any code in the
      !CONFIG_NET_NS case.
      
      This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      757856d2
  3. 01 7月, 2015 1 次提交
  4. 30 6月, 2015 1 次提交
  5. 25 6月, 2015 9 次提交
  6. 21 5月, 2015 2 次提交
    • I
      Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" · 521a04d0
      Ilya Dryomov 提交于
      This reverts commit ba9d114e.
      
      .. which introduced a regression that prevented all lingering requests
      requeued in kick_requests() from ever being sent to the OSDs, resulting
      in a lot of missed notifies.  In retrospect it's pretty obvious that
      r_req_lru_item item in the case of lingering requests can be used not
      only for notarget, but also for unsent linkage due to how tightly
      actual map and enqueue operations are coupled in __map_request().
      
      The assertion that was being silenced is taken care of in the previous
      ("libceph: request a new osdmap if lingering request maps to no osd")
      commit: by always kicking homeless lingering requests we ensure that
      none of them ends up on the notarget list outside of the critical
      section guarded by request_mutex.
      
      Cc: stable@vger.kernel.org # 3.18+, needs b0494532 "libceph: request a new osdmap if lingering request maps to no osd"
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      521a04d0
    • I
      libceph: request a new osdmap if lingering request maps to no osd · b0494532
      Ilya Dryomov 提交于
      This commit does two things.  First, if there are any homeless
      lingering requests, we now request a new osdmap even if the osdmap that
      is being processed brought no changes, i.e. if a given lingering
      request turned homeless in one of the previous epochs and remained
      homeless in the current epoch.  Not doing so leaves us with a stale
      osdmap and as a result we may miss our window for reestablishing the
      watch and lose notifies.
      
      MON=1 OSD=1:
      
          # cat linger-needmap.sh
          #!/bin/bash
          rbd create --size 1 test
          DEV=$(rbd map test)
          ceph osd out 0
          rbd map dne/dne # obtain a new osdmap as a side effect (!)
          sleep 1
          ceph osd in 0
          rbd resize --size 2 test
          # rbd info test | grep size -> 2M
          # blockdev --getsize $DEV -> 1M
      
      N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
      above is enough to make it miss that resize notify, but that is a
      bug^Wlimitation of ceph watch/notify v1.
      
      Second, homeless lingering requests are now kicked just like those
      lingering requests whose mapping has changed.  This is mainly to
      recognize that a homeless lingering request makes no sense and to
      preserve the invariant that a registered lingering request is not
      sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
      which commit ba9d114e ("libceph: clear r_req_lru_item in
      __unregister_linger_request()") tried to fix the _wrong_ way.
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      b0494532
  7. 11 5月, 2015 1 次提交
  8. 22 4月, 2015 3 次提交
    • I
      crush: straw2 bucket type with an efficient 64-bit crush_ln() · 958a2765
      Ilya Dryomov 提交于
      This is an improved straw bucket that correctly avoids any data movement
      between items A and B when neither A nor B's weights are changed.  Said
      differently, if we adjust the weight of item C (including adding it anew
      or removing it completely), we will only see inputs move to or from C,
      never between other items in the bucket.
      
      Notably, there is not intermediate scaling factor that needs to be
      calculated.  The mapping function is a simple function of the item weights.
      
      The below commits were squashed together into this one (mostly to avoid
      adding and then yanking a ~6000 lines worth of crush_ln_table):
      
      - crush: add a straw2 bucket type
      - crush: add crush_ln to calculate nature log efficently
      - crush: improve straw2 adjustment slightly
      - crush: change crush_ln to provide 32 more digits
      - crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
      - crush/mapper: fix divide-by-0 in straw2
        (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need
         to create a proper compat.h in ceph.git)
      
      Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                                32a1ead92efcd351822d22a5fc37d159c65c1338,
                                6289912418c4a3597a11778bcf29ed5415117ad9,
                                35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                                6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                                b5921d55d16796e12d66ad2c4add7305f9ce2353.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      958a2765
    • I
      crush: ensuring at most num-rep osds are selected · 45002267
      Ilya Dryomov 提交于
      Crush temporary buffers are allocated as per replica size configured
      by the user.  When there are more final osds (to be selected as per
      rule) than the replicas, buffer overlaps and it causes crash.  Now, it
      ensures that at most num-rep osds are selected even if more number of
      osds are allowed by the rule.
      
      Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
                                234b066ba04976783d15ff2abc3e81b6cc06fb10.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      45002267
    • I
      crush: drop unnecessary include from mapper.c · 9be6df21
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      9be6df21
  9. 20 4月, 2015 3 次提交
  10. 08 4月, 2015 1 次提交
    • I
      Revert "libceph: use memalloc flags for net IO" · 6d7fdb0a
      Ilya Dryomov 提交于
      This reverts commit 89baaa57.
      
      Dirty page throttling should be sufficient for us in the general case
      so there is no need to use __GFP_MEMALLOC - it would be needed only in
      the swap-over-rbd case, which we currently don't support.  (It would
      probably take approximately the commit that is being reverted to add
      that support, but we would also need the "swap" option to distinguish
      from the general case and make sure swap ceph_client-s aren't shared
      with anything else.)  See ceph-devel threads [1] and [2] for the
      details of why enabling pfmemalloc reserves for all cases is a bad
      thing.
      
      On top of potential system lockups related to drained emergency
      reserves, this turned out to cause ceph lockups in case peers are on
      the same host and communicating via loopback due to sk_filter()
      dropping pfmemalloc skbs on the receiving side because the receiving
      loopback socket is not tagged with SOCK_MEMALLOC.
      
      [1] "SOCK_MEMALLOC vs loopback"
          http://www.spinics.net/lists/ceph-devel/msg22998.html
      [2] "[PATCH] libceph: don't set memalloc flags in loopback case"
          http://www.spinics.net/lists/ceph-devel/msg23392.html
      
      Conflicts:
      	net/ceph/messenger.c [ context: tcp_nodelay option ]
      
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Sage Weil <sage@redhat.com>
      Cc: stable@vger.kernel.org # 3.18+, needs backporting
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Acked-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NMel Gorman <mgorman@suse.de>
      6d7fdb0a
  11. 19 2月, 2015 5 次提交
  12. 12 2月, 2015 1 次提交
  13. 09 1月, 2015 1 次提交
  14. 18 12月, 2014 4 次提交