1. 25 6月, 2015 2 次提交
  2. 21 5月, 2015 2 次提交
    • I
      Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" · 521a04d0
      Ilya Dryomov 提交于
      This reverts commit ba9d114e.
      
      .. which introduced a regression that prevented all lingering requests
      requeued in kick_requests() from ever being sent to the OSDs, resulting
      in a lot of missed notifies.  In retrospect it's pretty obvious that
      r_req_lru_item item in the case of lingering requests can be used not
      only for notarget, but also for unsent linkage due to how tightly
      actual map and enqueue operations are coupled in __map_request().
      
      The assertion that was being silenced is taken care of in the previous
      ("libceph: request a new osdmap if lingering request maps to no osd")
      commit: by always kicking homeless lingering requests we ensure that
      none of them ends up on the notarget list outside of the critical
      section guarded by request_mutex.
      
      Cc: stable@vger.kernel.org # 3.18+, needs b0494532 "libceph: request a new osdmap if lingering request maps to no osd"
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      521a04d0
    • I
      libceph: request a new osdmap if lingering request maps to no osd · b0494532
      Ilya Dryomov 提交于
      This commit does two things.  First, if there are any homeless
      lingering requests, we now request a new osdmap even if the osdmap that
      is being processed brought no changes, i.e. if a given lingering
      request turned homeless in one of the previous epochs and remained
      homeless in the current epoch.  Not doing so leaves us with a stale
      osdmap and as a result we may miss our window for reestablishing the
      watch and lose notifies.
      
      MON=1 OSD=1:
      
          # cat linger-needmap.sh
          #!/bin/bash
          rbd create --size 1 test
          DEV=$(rbd map test)
          ceph osd out 0
          rbd map dne/dne # obtain a new osdmap as a side effect (!)
          sleep 1
          ceph osd in 0
          rbd resize --size 2 test
          # rbd info test | grep size -> 2M
          # blockdev --getsize $DEV -> 1M
      
      N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
      above is enough to make it miss that resize notify, but that is a
      bug^Wlimitation of ceph watch/notify v1.
      
      Second, homeless lingering requests are now kicked just like those
      lingering requests whose mapping has changed.  This is mainly to
      recognize that a homeless lingering request makes no sense and to
      preserve the invariant that a registered lingering request is not
      sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
      which commit ba9d114e ("libceph: clear r_req_lru_item in
      __unregister_linger_request()") tried to fix the _wrong_ way.
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      b0494532
  3. 19 2月, 2015 2 次提交
    • I
      libceph: kfree() in put_osd() shouldn't depend on authorizer · b28ec2f3
      Ilya Dryomov 提交于
      a255651d ("ceph: ensure auth ops are defined before use") made
      kfree() in put_osd() conditional on the authorizer.  A mechanical
      mistake most likely - fix it.
      
      Cc: Alex Elder <elder@linaro.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      b28ec2f3
    • I
      libceph: fix double __remove_osd() problem · 7eb71e03
      Ilya Dryomov 提交于
      It turns out it's possible to get __remove_osd() called twice on the
      same OSD.  That doesn't sit well with rb_erase() - depending on the
      shape of the tree we can get a NULL dereference, a soft lockup or
      a random crash at some point in the future as we end up touching freed
      memory.  One scenario that I was able to reproduce is as follows:
      
                  <osd3 is idle, on the osd lru list>
      <con reset - osd3>
      con_fault_finish()
        osd_reset()
                                    <osdmap - osd3 down>
                                    ceph_osdc_handle_map()
                                      <takes map_sem>
                                      kick_requests()
                                        <takes request_mutex>
                                        reset_changed_osds()
                                          __reset_osd()
                                            __remove_osd()
                                        <releases request_mutex>
                                      <releases map_sem>
          <takes map_sem>
          <takes request_mutex>
          __kick_osd_requests()
            __reset_osd()
              __remove_osd() <-- !!!
      
      A case can be made that osd refcounting is imperfect and reworking it
      would be a proper resolution, but for now Sage and I decided to fix
      this by adding a safe guard around __remove_osd().
      
      Fixes: http://tracker.ceph.com/issues/8087
      
      Cc: Sage Weil <sage@redhat.com>
      Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd()
      Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts
      Cc: stable@vger.kernel.org # 3.9+
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      7eb71e03
  4. 18 12月, 2014 4 次提交
  5. 14 11月, 2014 3 次提交
  6. 15 10月, 2014 5 次提交
  7. 08 7月, 2014 9 次提交
  8. 12 6月, 2014 1 次提交
  9. 05 4月, 2014 2 次提交
  10. 03 4月, 2014 2 次提交
  11. 14 2月, 2014 1 次提交
  12. 08 2月, 2014 2 次提交
  13. 04 2月, 2014 1 次提交
  14. 28 1月, 2014 4 次提交