1. 23 10月, 2015 4 次提交
  2. 16 10月, 2015 2 次提交
    • I
      rbd: use writefull op for object size writes · e30b7577
      Ilya Dryomov 提交于
      This covers only the simplest case - an object size sized write, but
      it's still useful in tiering setups when EC is used for the base tier
      as writefull op can be proxied, saving an object promotion.
      
      Even though updating ceph_osdc_new_request() to allow writefull should
      just be a matter of fixing an assert, I didn't do it because its only
      user is cephfs.  All other sites were updated.
      
      Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      e30b7577
    • I
      rbd: set max_sectors explicitly · 0d9fde4f
      Ilya Dryomov 提交于
      Commit 30e2bc08 ("Revert "block: remove artifical max_hw_sectors
      cap"") restored a clamp on max_sectors.  It's now 2560 sectors instead
      of 1024, but it's not good enough: we set max_hw_sectors to rbd object
      size because we don't want object sized I/Os to be split, and the
      default object size is 4M.
      
      So, set max_sectors to max_hw_sectors in rbd at queue init time.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      0d9fde4f
  3. 01 10月, 2015 1 次提交
  4. 24 9月, 2015 2 次提交
    • K
      NVMe: Set affinity after allocating request queues · bda4e0fb
      Keith Busch 提交于
      The asynchronous namespace scanning caused affinity hints to be set before
      its tagset initialized, so there was no cpu mask to set the hint. This
      patch moves the affinity hint setting to after namespaces are scanned.
      Reported-by: N김경산 <ks0204.kim@samsung.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bda4e0fb
    • R
      xen/blkback: free requests on disconnection · f929d42c
      Roger Pau Monne 提交于
      This is due to  commit 86839c56
      "xen/block: add multi-page ring support"
      
      When using an guest under UEFI - after the domain is destroyed
      the following warning comes from blkback.
      
      ------------[ cut here ]------------
      WARNING: CPU: 2 PID: 95 at
      /home/julien/works/linux/drivers/block/xen-blkback/xenbus.c:274
      xen_blkif_deferred_free+0x1f4/0x1f8()
      Modules linked in:
      CPU: 2 PID: 95 Comm: kworker/2:1 Tainted: G        W       4.2.0 #85
      Hardware name: APM X-Gene Mustang board (DT)
      Workqueue: events xen_blkif_deferred_free
      Call trace:
      [<ffff8000000890a8>] dump_backtrace+0x0/0x124
      [<ffff8000000891dc>] show_stack+0x10/0x1c
      [<ffff8000007653bc>] dump_stack+0x78/0x98
      [<ffff800000097e88>] warn_slowpath_common+0x9c/0xd4
      [<ffff800000097f80>] warn_slowpath_null+0x14/0x20
      [<ffff800000557a0c>] xen_blkif_deferred_free+0x1f0/0x1f8
      [<ffff8000000ad020>] process_one_work+0x160/0x3b4
      [<ffff8000000ad3b4>] worker_thread+0x140/0x494
      [<ffff8000000b2e34>] kthread+0xd8/0xf0
      ---[ end trace 6f859b7883c88cdd ]---
      
      Request allocation has been moved to connect_ring, which is called every
      time blkback connects to the frontend (this can happen multiple times during
      a blkback instance life cycle). On the other hand, request freeing has not
      been moved, so it's only called when destroying the backend instance. Due to
      this mismatch, blkback can allocate the request pool multiple times, without
      freeing it.
      
      In order to fix it, move the freeing of requests to xen_blkif_disconnect to
      restore the symmetry between request allocation and freeing.
      Reported-by: NJulien Grall <julien.grall@citrix.com>
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Tested-by: NJulien Grall <julien.grall@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: xen-devel@lists.xenproject.org
      CC: stable@vger.kernel.org # 4.2
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f929d42c
  5. 18 9月, 2015 1 次提交
  6. 09 9月, 2015 6 次提交
    • S
      zram: unify error reporting · 70864969
      Sergey Senozhatsky 提交于
      Make zram syslog error reporting more consistent. We have random
      error levels in some places. For example, critical errors like
        "Error allocating memory for compressed page"
      and
        "Unable to allocate temp memory"
      are reported as KERN_INFO messages.
      
      a) Reassign error levels
      
      Error messages that directly affect zram
      functionality -- pr_err():
      
       Error allocating zram address table
       Error creating memory pool
       Decompression failed! err=%d, page=%u
       Unable to allocate temp memory
       Compression failed! err=%d
       Error allocating memory for compressed page: %u, size=%zu
       Cannot initialise %s compressing backend
       Error allocating disk queue for device %d
       Error allocating disk structure for device %d
       Error creating sysfs group for device %d
       Unable to register zram-control class
       Unable to get major number
      
      Messages that do not affect functionality, but user
      must be warned (because sysfs attrs will be removed in
      this particular case) -- pr_warn():
      
       %d (%s) Attribute %s (and others) will be removed. %s
      
      Messages that do not affect functionality and mostly are
      informative -- pr_info():
      
       Cannot change max compression streams
       Can't change algorithm for initialized device
       Cannot change disksize for initialized device
       Added device: %s
       Removed device: %s
      
      b) Update sysfs_create_group() error message
      
      First, it lacks a trailing new line; add it.  Second, every error message
      in zram_add() has a "for device %d" part, which makes errors more
      informative.  Add missing part to "Error creating sysfs group" message.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70864969
    • S
      zsmalloc: account the number of compacted pages · 860c707d
      Sergey Senozhatsky 提交于
      Compaction returns back to zram the number of migrated objects, which is
      quite uninformative -- we have objects of different sizes so user space
      cannot obtain any valuable data from that number.  Change compaction to
      operate in terms of pages and return back to compaction issuer the
      number of pages that were freed during compaction.  So from now on we
      will export more meaningful value in zram<id>/mm_stat -- the number of
      freed (compacted) pages.
      
      This requires:
       (a) a rename of `num_migrated' to 'pages_compacted'
       (b) a internal API change -- return first_page's fullness_group from
           putback_zspage(), so we know when putback_zspage() did
           free_zspage().  It helps us to account compaction stats correctly.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      860c707d
    • S
      zsmalloc/zram: introduce zs_pool_stats api · 7d3f3938
      Sergey Senozhatsky 提交于
      `zs_compact_control' accounts the number of migrated objects but it has
      a limited lifespan -- we lose it as soon as zs_compaction() returns back
      to zram.  It worked fine, because (a) zram had it's own counter of
      migrated objects and (b) only zram could trigger compaction.  However,
      this does not work for automatic pool compaction (not issued by zram).
      To account objects migrated during auto-compaction (issued by the
      shrinker) we need to store this number in zs_pool.
      
      Define a new `struct zs_pool_stats' structure to keep zs_pool's stats
      there.  It provides only `num_migrated', as of this writing, but it
      surely can be extended.
      
      A new zsmalloc zs_pool_stats() symbol exports zs_pool's stats back to
      caller.
      
      Use zs_pool_stats() in zram and remove `num_migrated' from zram_stats.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d3f3938
    • I
      rbd: plug rbd_dev->header.object_prefix memory leak · d194cd1d
      Ilya Dryomov 提交于
      Need to free object_prefix when rbd_dev_v2_snap_context() fails, but
      only if this is the first time we are reading in the header.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      d194cd1d
    • I
      rbd: fix double free on rbd_dev->header_name · 3ebe138a
      Ilya Dryomov 提交于
      If rbd_dev_image_probe() in rbd_dev_probe_parent() fails, header_name
      is freed twice: once in rbd_dev_probe_parent() and then in its caller
      rbd_dev_image_probe() (rbd_dev_image_probe() is called recursively to
      handle parent images).
      
      rbd_dev_probe_parent() is responsible for probing the parent, so it
      shouldn't muck with clone's fields.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      3ebe138a
    • J
      xen: Use correctly the Xen memory terminologies · 0df4f266
      Julien Grall 提交于
      Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
      is meant, I suspect this is because the first support for Xen was for
      PV. This resulted in some misimplementation of helpers on ARM and
      confused developers about the expected behavior.
      
      For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
      Although, if we look at the implementation on x86, it's returning a GFN.
      
      For clarity and avoid new confusion, replace any reference to mfn with
      gfn in any helpers used by PV drivers. The x86 code will still keep some
      reference of pfn_to_mfn which may be used by all kind of guests
      No changes as been made in the hypercall field, even
      though they may be invalid, in order to keep the same as the defintion
      in xen repo.
      
      Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
      name to close to the KVM function gfn_to_page.
      
      Take also the opportunity to simplify simple construction such
      as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
      will come in follow-up patches.
      
      [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cbSigned-off-by: NJulien Grall <julien.grall@citrix.com>
      Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      0df4f266
  7. 08 9月, 2015 2 次提交
  8. 03 9月, 2015 2 次提交
  9. 28 8月, 2015 1 次提交
  10. 26 8月, 2015 2 次提交
    • A
      NVMe: Using PRACT bit to generate and verify PI by controller · e19b127f
      Alok Pandey 提交于
      This patch enables the PRCHK and reftag support when PRACT bit is set, and
      block layer integrity is disabled.
      Signed-off-by: NAlok Pandey <pandey.alok@samsung.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e19b127f
    • J
      mtip32x: fix regression introduced by blk-mq per-hctx flush · 74c9c913
      Jeff Moyer 提交于
      Hi,
      
      After commit f70ced09 (blk-mq: support per-distpatch_queue flush
      machinery), the mtip32xx driver may oops upon module load due to walking
      off the end of an array in mtip_init_cmd.  On initialization of the
      flush_rq, init_request is called with request_index >= the maximum queue
      depth the driver supports.  For mtip32xx, this value is used to index
      into an array.  What this means is that the driver will walk off the end
      of the array, and either oops or cause random memory corruption.
      
      The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
      driver in a loop.  I can typically reproduce the problem in about 30
      seconds.
      
      Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
      I think we can simply return without doing anything.  In addition, no
      other mq-enabled driver does anything with the request_index passed into
      init_request(), so no other driver is affected.  However, I'm not really
      sure what is expected of drivers.  Ming, what did you envision drivers
      would do when initializing the flush requests?
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      74c9c913
  11. 21 8月, 2015 1 次提交
  12. 20 8月, 2015 3 次提交
  13. 19 8月, 2015 4 次提交
  14. 18 8月, 2015 1 次提交
    • K
      NVMe: Set queue max segments · e824410f
      Keith Busch 提交于
      This sets the queue's max segment size to match the device's
      capabilities. The default of 128 is usable until a device's transfer
      capability exceeds 512k, assuming a device page size of 4k. Many nvme
      devices exceed that transfer limit, so this lets the block layer know what
      kind of commands it to allow to form rather than unnecessarily split them.
      
      One additional segment is added to account for a transfer that may start
      in the middle of a page.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e824410f
  15. 17 8月, 2015 8 次提交