1. 20 8月, 2015 1 次提交
  2. 15 8月, 2015 1 次提交
    • S
      zram: fix pool name truncation · 4ce321f5
      Sergey Senozhatsky 提交于
      zram_meta_alloc() constructs a pool name for zs_create_pool() call as
      
          snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);
      
      However, it defines pool name buffer to be only 8 bytes long (minus
      trailing zero), which means that we can have only 1000 pool names: zram0
      -- zram999.
      
      With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
      can fail if device zram100 already exists, because snprintf() will
      truncate new pool name to zram100 and pass it debugfs_create_dir(),
      causing:
      
        debugfs dir <zram100> creation failed
        zram: Error creating memory pool
      
      ... and so on.
      
      Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
      divice_id.  We construct zram%d name earlier and keep it as a ->disk_name,
      no need to snprintf() it again.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ce321f5
  3. 31 7月, 2015 1 次提交
    • I
      rbd: fix copyup completion race · 2761713d
      Ilya Dryomov 提交于
      For write/discard obj_requests that involved a copyup method call, the
      opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
      rbd_img_obj_copyup_callback().  The latter frees copyup pages, sets
      ->xferred and delegates to rbd_img_obj_callback(), the "normal" image
      object callback, for reporting to block layer and putting refs.
      
      rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
      which means obj_request is marked done in rbd_osd_trivial_callback(),
      *before* ->callback is invoked and rbd_img_obj_copyup_callback() has
      a chance to run.  Marking obj_request done essentially means giving
      rbd_img_obj_callback() a license to end it at any moment, so if another
      obj_request from the same img_request is being completed concurrently,
      rbd_img_obj_end_request() may very well be called on such prematurally
      marked done request:
      
      <obj_request-1/2 reply>
      handle_reply()
        rbd_osd_req_callback()
          rbd_osd_trivial_callback()
          rbd_obj_request_complete()
          rbd_img_obj_copyup_callback()
          rbd_img_obj_callback()
                                          <obj_request-2/2 reply>
                                          handle_reply()
                                            rbd_osd_req_callback()
                                              rbd_osd_trivial_callback()
            for_each_obj_request(obj_request->img_request) {
              rbd_img_obj_end_request(obj_request-1/2)
              rbd_img_obj_end_request(obj_request-2/2) <--
            }
      
      Calling rbd_img_obj_end_request() on such a request leads to trouble,
      in particular because its ->xfferred is 0.  We report 0 to the block
      layer with blk_update_request(), get back 1 for "this request has more
      data in flight" and then trip on
      
          rbd_assert(more ^ (which == img_request->obj_request_count));
      
      with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
      been called for both requests and lhs (more) being 1 because we haven't
      got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
      
      To fix this, leverage that rbd wants to call class methods in only two
      cases: one is a generic method call wrapper (obj_request is standalone)
      and the other is a copyup (obj_request is part of an img_request).  So
      make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
      rbd_img_obj_copyup_callback() from it if obj_request is part of an
      img_request, similar to how CEPH_OSD_OP_READ handler invokes
      rbd_img_obj_request_read_callback().
      
      Since rbd_img_obj_copyup_callback() is now being called from the OSD
      request callback (only), it is renamed to rbd_osd_copyup_callback().
      
      Cc: Alex Elder <elder@linaro.org>
      Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      2761713d
  4. 24 7月, 2015 3 次提交
  5. 23 7月, 2015 1 次提交
  6. 16 7月, 2015 1 次提交
  7. 02 7月, 2015 1 次提交
  8. 01 7月, 2015 1 次提交
    • I
      rbd: use GFP_NOIO in rbd_obj_request_create() · 5a60e876
      Ilya Dryomov 提交于
      rbd_obj_request_create() is called on the main I/O path, so we need to
      use GFP_NOIO to make sure allocation doesn't blow back on us.  Not all
      callers need this, but I'm still hardcoding the flag inside rather than
      making it a parameter because a) this is going to stable, and b) those
      callers shouldn't really use rbd_obj_request_create() and will be fixed
      in the future.
      
      More memory allocation fixes will follow.
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      5a60e876
  9. 28 6月, 2015 6 次提交
  10. 26 6月, 2015 13 次提交
  11. 25 6月, 2015 8 次提交
    • I
      rbd: queue_depth map option · b5584180
      Ilya Dryomov 提交于
      nr_requests (/sys/block/rbd<id>/queue/nr_requests) is pretty much
      irrelevant in blk-mq case because each driver sets its own max depth
      that it can handle and that's the number of tags that gets preallocated
      on setup.  Users can't increase queue depth beyond that value via
      writing to nr_requests.
      
      For rbd we are happy with the default BLKDEV_MAX_RQ (128) for most
      cases but we want to give users the opportunity to increase it.
      Introduce a new per-device queue_depth option to do just that:
      
          $ sudo rbd map -o queue_depth=1024 ...
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      b5584180
    • I
      rbd: store rbd_options in rbd_device · d147543d
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      d147543d
    • I
      rbd: terminate rbd_opts_tokens with Opt_err · 210c104c
      Ilya Dryomov 提交于
      Also nuke useless Opt_last_bool and don't break lines unnecessarily.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      210c104c
    • I
      rbd: bump queue_max_segments · d3834fef
      Ilya Dryomov 提交于
      The default queue_limits::max_segments value (BLK_MAX_SEGMENTS = 128)
      unnecessarily limits bio sizes to 512k (assuming 4k pages).  rbd, being
      a virtual block device, doesn't have any restrictions on the number of
      physical segments, so bump max_segments to max_hw_sectors, in theory
      allowing a sector per segment (although the only case this matters that
      I can think of is some readv/writev style thing).  In practice this is
      going to give us 1M bios - the number of segments in a bio is limited
      in bio_get_nr_vecs() by BIO_MAX_PAGES = 256.
      
      Note that this doesn't result in any improvement on a typical direct
      sequential test.  This is because on a box with a not too badly
      fragmented memory the default BLK_MAX_SEGMENTS is enough to see nice
      rbd object size sized requests.  The only difference is the size of
      bios being merged - 512k vs 1M for something like
      
          $ dd if=/dev/zero of=/dev/rbd0 oflag=direct bs=$RBD_OBJ_SIZE
          $ dd if=/dev/rbd0 iflag=direct of=/dev/null bs=$RBD_OBJ_SIZE
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      d3834fef
    • I
      rbd: timeout watch teardown on unmap with mount_timeout · 2894e1d7
      Ilya Dryomov 提交于
      As part of unmap sequence, kernel client has to talk to the OSDs to
      teardown watch on the header object.  If none of the OSDs are available
      it would hang forever, until interrupted by a signal - when that
      happens we follow through with the rest of unmap procedure (i.e.
      unregister the device and put all the data structures) and the unmap is
      still considired successful (rbd cli tool exits with 0).  The watch on
      the userspace side should eventually timeout so that's fine.
      
      This isn't very nice, because various userspace tools (pacemaker rbd
      resource agent, for example) then have to worry about setting up their
      own timeouts.  Timeout it with mount_timeout (60 seconds by default).
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      Reviewed-by: NSage Weil <sage@redhat.com>
      2894e1d7
    • I
      libceph: store timeouts in jiffies, verify user input · a319bf56
      Ilya Dryomov 提交于
      There are currently three libceph-level timeouts that the user can
      specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
      these are in seconds and no checking is done on user input: negative
      values are accepted, we multiply them all by HZ which may or may not
      overflow, arbitrarily large jiffies then get added together, etc.
      
      There is also a bug in the way mount_timeout=0 is handled.  It's
      supposed to mean "infinite timeout", but that's not how wait.h APIs
      treat it and so __ceph_open_session() for example will busy loop
      without much chance of being interrupted if none of ceph-mons are
      there.
      
      Fix all this by verifying user input, storing timeouts capped by
      msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
      helper for all user-specified waits to handle infinite timeouts
      correctly.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      a319bf56
    • Y
      libceph: allow setting osd_req_op's flags · 144cba14
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      144cba14
    • D
      libnvdimm, pmem: move pmem to drivers/nvdimm/ · 18da2c9e
      Dan Williams 提交于
      Prepare the pmem driver to consume PMEM namespaces emitted by regions of
      an nvdimm_bus instance.  No functional change.
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      18da2c9e
  12. 24 6月, 2015 3 次提交