提交 · 30d53d9c11b6c2f71253a2be582969d7e6fa7f10 · openanolis / cloud-kernel

17 8月, 2015 7 次提交

由 Markus Pargmann 提交于 8月 17, 2015

Add some debugfs files that help to understand the internal state of
NBD. This exports the different sizes, flags, tasks and so on.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

30d53d9c

nbd: Remove variable 'pid' · 6521d39a

由 Markus Pargmann 提交于 8月 17, 2015

This patch uses nbd->task_recv to determine the value of the previously
used variable 'pid' for sysfs.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6521d39a

nbd: Move clear queue debug message · e78273c8

由 Markus Pargmann 提交于 8月 17, 2015

This message was a warning without a reason. This patch moves it into
nbd_clear_que and transforms it to a debug message.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e78273c8

nbd: Remove 'harderror' and propagate error properly · 19391830

由 Markus Pargmann 提交于 8月 17, 2015

Instead of a variable 'harderror' we can simply try to correctly
propagate errors to the userspace.

This patch removes the harderror variable and passes errors through
error pointers and nbd_do_it back to the userspace.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

19391830

nbd: restructure sock_shutdown · 260bbce4

由 Markus Pargmann 提交于 8月 17, 2015

This patch restructures sock_shutdown to avoid having the main code path
in an if block.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

260bbce4

nbd: sock_shutdown, remove conditional lock · 36e47bee

由 Markus Pargmann 提交于 8月 17, 2015

Move the conditional lock from sock_shutdown into the surrounding code.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

36e47bee

nbd: Fix timeout detection · 7e2893a1

由 Markus Pargmann 提交于 8月 17, 2015

At the moment the nbd timeout just detects hanging tcp operations. This
is not enough to detect a hanging or bad connection as expected of a
timeout.

This patch redesigns the timeout detection to include some more cases.
The timeout is now in relation to replies from the server. If the server
does not send replies within the timeout the connection will be shut
down.

The patch adds a continous timer 'timeout_timer' that is setup in one of
two cases:
 - The request list is empty and we are sending the first request out to
   the server. We want to have a reply within the given timeout,
   otherwise we consider the connection to be dead.
 - A server response was received. This means the server is still
   communicating with us. The timer is reset to the timeout value.

The timer is not stopped if the list becomes empty. It will just trigger
a timeout which will directly leave the handling routine again as the
request list is empty.

The whole patch does not use any additional explicit locking. The
list_empty() calls are safe to be used concurrently. The timer is locked
internally as we just use mod_timer and del_timer_sync().

The patch is based on the idea of Michal Belczyk with a previous
different implementation.

Cc: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Tested-by: NHermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

7e2893a1

22 7月, 2015 1 次提交

nvme: Fixes u64 division which breaks i386 builds · c45f5c99

由 Jon Derrick 提交于 7月 21, 2015

Uses div_u64 for u64 division and round_down, a bitwise operation,
instead of rounddown, which uses a modulus.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c45f5c99

21 7月, 2015 2 次提交

NVMe: Use CMB for the IO SQes if available · 8ffaadf7

由 Jon Derrick 提交于 7月 20, 2015

Some controllers have a controller-side memory buffer available for use
for submissions, completions, lists, or data.

If a CMB is available, the entire CMB will be ioremapped and it will
attempt to map the IO SQes onto the CMB. The queues will be shrunk as
needed. The CMB will not be used if the queue depth is shrunk below some
threshold where it may have reduced performance over a larger queue
in system memory.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8ffaadf7

NVMe: Unify SQ entry writing and doorbell ringing · 498c4394

由 Jon Derrick 提交于 7月 20, 2015

This patch changes sq_cmd writers to instead create their command on
the stack. __nvme_submit_cmd copies the sq entry to the queue and writes
the doorbell.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

498c4394

17 7月, 2015 1 次提交

block: have drivers use blk_queue_max_discard_sectors() · 2bb4cd5c

由 Jens Axboe 提交于 7月 14, 2015

Some drivers use it now, others just set the limits field manually.
But in preparation for splitting this into a hard and soft limit,
ensure that they all call the proper function for setting the hw
limit for discards.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2bb4cd5c

16 7月, 2015 1 次提交

NVMe: Reread partitions on metadata formats · 7bee6074

由 Keith Busch 提交于 7月 14, 2015

This patch has the driver automatically reread partitions if a namespace
has a separate metadata format. Previously revalidating a disk was
sufficient to get the correct capacity set on such formatted drives,
but partitions that may exist would not have been surfaced.
Reported-by: NPaul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Tested-by: NPaul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7bee6074

02 7月, 2015 1 次提交

NVMe: Fix irq freeing when queue_request_irq fails · 758dd7fd

由 Jon Derrick 提交于 6月 30, 2015

Fixes an issue when queue_reuest_irq fails in nvme_setup_io_queues. This
patch initializes all vectors to -1 and resets the vector to -1 in the
case of a failure in queue_request_irq. This avoids the free_irq in
nvme_suspend_queue if the queue did not get an irq.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

758dd7fd

01 7月, 2015 1 次提交

rbd: use GFP_NOIO in rbd_obj_request_create() · 5a60e876

由 Ilya Dryomov 提交于 6月 24, 2015

rbd_obj_request_create() is called on the main I/O path, so we need to
use GFP_NOIO to make sure allocation doesn't blow back on us.  Not all
callers need this, but I'm still hardcoding the flag inside rather than
making it a parameter because a) this is going to stable, and b) those
callers shouldn't really use rbd_obj_request_create() and will be fixed
in the future.

More memory allocation fixes will follow.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

5a60e876

28 6月, 2015 6 次提交

drivers/block/nvme-core.c: fix build with gcc-4.4.4 · e44ac588

由 Andrew Morton 提交于 6月 27, 2015

gcc-4.4.4 (and possibly other versions) fail the compile when initializers
are used with anonymous unions.  Work around this.

drivers/block/nvme-core.c: In function 'nvme_identify_ctrl':
drivers/block/nvme-core.c:1163: error: unknown field 'identify' specified in initializer
drivers/block/nvme-core.c:1163: warning: missing braces around initializer
drivers/block/nvme-core.c:1163: warning: (near initialization for 'c.<anonymous>')
drivers/block/nvme-core.c:1164: error: unknown field 'identify' specified in initializer
drivers/block/nvme-core.c:1164: warning: excess elements in struct initializer
drivers/block/nvme-core.c:1164: warning: (near initialization for 'c')
...

This patch has no effect on text size with gcc-4.8.2.

Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

e44ac588

NVMe: Fix filesystem deadlock on removal · 3399a3f7

由 Keith Busch 提交于 6月 18, 2015

Move gendisk deletion before controller shutdown so filesystem may sync
dirty pages. Before, this would deadlock trying to allocate requests
on frozen queues that are about to be deleted.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3399a3f7

NVMe: Failed controller initialization fixes · de3eff2b

由 Keith Busch 提交于 6月 18, 2015

This fixes an infinite device reset loop that may occur on devices that
fail initialization. If the drive fails to become ready for any reason
that does not involve an admin command timeout, the probe task should
assume the drive is unavailable and remove it from the topology. In
the case an admin command times out during device probing, the driver's
existing reset action will handle removing the drive.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

de3eff2b

NVMe: Unify controller probe and resume · ffe7704d

由 Keith Busch 提交于 6月 08, 2015

This unifies probe and resume so they both may be scheduled in the same
way. This is necessary for error handling that may occur during device
initialization since the task to cleanup the device wouldn't be able to
run if it is blocked on device initialization.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ffe7704d

NVMe: Don't use fake status on cancelled command · 17188bb4

由 Keith Busch 提交于 6月 08, 2015

Synchronized commands do different things for timed out commands
vs. controller returned errors.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

17188bb4

NVMe: Fix device cleanup on initialization failure · 4af0e21c

由 Keith Busch 提交于 6月 08, 2015

Don't release block queue and tagging resoureces if the driver never
got them in the first place. This can happen if the controller fails to
become ready, if memory wasn't available to allocate a tagset or admin
queue, or if the resources were released as part of error recovery.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4af0e21c

26 6月, 2015 13 次提交

zram: check comp algorithm availability earlier · d93435c3

由 Sergey Senozhatsky 提交于 6月 25, 2015

Improvement idea by Marcin Jabrzyk.

comp_algorithm_store() silently accepts any supplied algorithm name,
because zram performs algorithm availability check later, during the
device configuration phase in disksize_store() and emits the following
error:

  "zram: Cannot initialise %s compressing backend"

this error line is somewhat generic and, besides, can indicate a failed
attempt to allocate compression backend's working buffers.

add algorithm availability check to comp_algorithm_store():

  echo lzz > /sys/block/zram0/comp_algorithm
  -bash: echo: write error: Invalid argument
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NMarcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d93435c3

zram: cut trailing newline in algorithm name · 4bbacd51

由 Sergey Senozhatsky 提交于 6月 25, 2015

Supplied sysfs values sometimes contain new-line symbols (echo vs.  echo
-n), which we also copy as a compression algorithm name.  it works fine
when we lookup for compression algorithm, because we use sysfs_streq()
which takes care of new line symbols.  however, it doesn't look nice when
we print compression algorithm name if zcomp_create() failed:

 zram: Cannot initialise LXZ
            compressing backend

cut trailing new-line, so the error string will look like

  zram: Cannot initialise LXZ compressing backend

we also now can replace sysfs_streq() in zcomp_available_show() with
strcmp().
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4bbacd51

zram: cosmetic zram_bvec_write() cleanup · 17162f41

由 Sergey Senozhatsky 提交于 6月 25, 2015

`bool locked' local variable tells us if we should perform
zcomp_strm_release() or not (jumped to `out' label before
zcomp_strm_find() occurred), which is equivalent to `zstrm' being or not
being NULL.  remove `locked' and check `zstrm' instead.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17162f41

zram: add dynamic device add/remove functionality · 6566d1a3

由 Sergey Senozhatsky 提交于 6月 25, 2015

We currently don't support on-demand device creation.  The one and only
way to have N zram devices is to specify num_devices module parameter
(default value: 1).  IOW if, for some reason, at some point, user wants
to have N + 1 devies he/she must umount all the existing devices, unload
the module, load the module passing num_devices equals to N + 1.  And do
this again, if needed.

This patch introduces zram control sysfs class, which has two sysfs
attrs:
- hot_add      -- add a new zram device
- hot_remove   -- remove a specific (device_id) zram device

hot_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim).  read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.

Usage example:
	# add a new specific zram device
	cat /sys/class/zram-control/hot_add
	2

	# remove a specific zram device
	echo 4 > /sys/class/zram-control/hot_remove

Returning zram_add() error code back to user (-ENOMEM in this case)

	cat /sys/class/zram-control/hot_add
	cat: /sys/class/zram-control/hot_add: Cannot allocate memory

NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.

[minchan@kernel.org: use zram->claim to avoid lockdep splat]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6566d1a3

zram: close race by open overriding · f405c445

由 Sergey Senozhatsky 提交于 6月 25, 2015

[ Original patch from Minchan Kim <minchan@kernel.org> ]

Commit ba6b17d6 ("zram: fix umount-reset_store-mount race
condition") introduced bdev->bd_mutex to protect a race between mount
and reset.  At that time, we don't have dynamic zram-add/remove feature
so it was okay.

However, as we introduce dynamic device feature, bd_mutex became
trouble.

	CPU 0

echo 1 > /sys/block/zram<id>/reset
  -> kernfs->s_active(A)
    -> zram:reset_store->bd_mutex(B)

	CPU 1

echo <id> > /sys/class/zram/zram-remove
  ->zram:zram_remove: bd_mutex(B)
  -> sysfs_remove_group
    -> kernfs->s_active(A)

IOW, AB -> BA deadlock

The reason we are holding bd_mutex for zram_remove is to prevent
any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
others already have opened. But it causes above deadlock problem.

To fix the problem, this patch overrides block_device.open and
it returns -EBUSY if zram asserts he claims zram to reset so any
incoming open will be failed so we don't need to hold bd_mutex
for zram_remove ayn more.

This patch is to prepare for zram-add/remove feature.

[sergey.senozhatsky@gmail.com: simplify reset_store()]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f405c445

zram: return zram device_id from zram_add() · 92ff1528

由 Sergey Senozhatsky 提交于 6月 25, 2015

This patch prepares zram to enable on-demand device creation.
zram_add() performs automatic device_id assignment and returns
new device id (>= 0) or error code (< 0).
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92ff1528

zram: trivial: correct flag operations comment · b31177f2

由 Sergey Senozhatsky 提交于 6月 25, 2015

We don't have meta->tb_lock anymore and use meta table entry bit_spin_lock
instead. update corresponding comment.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b31177f2

zram: report every added and removed device · d12b63c9

由 Sergey Senozhatsky 提交于 6月 25, 2015

With dynamic device creation/removal (which will be introduced later in
the series) printing num_devices in zram_init() will not make a lot of
sense, as well as printing the number of destroyed devices in
destroy_devices().  Print per-device action (added/removed) in zram_add()
and zram_remove() instead.

Example:

[ 3645.259652] zram: Added device: zram5
[ 3646.152074] zram: Added device: zram6
[ 3650.585012] zram: Removed device: zram5
[ 3655.845584] zram: Added device: zram8
[ 3660.975223] zram: Removed device: zram6
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d12b63c9

zram: remove max_num_devices limitation · c3cdb40e

由 Sergey Senozhatsky 提交于 6月 25, 2015

Limiting the number of zram devices to 32 (default max_num_devices value)
is confusing, let's drop it.  A user with 2TB or 4TB of RAM, for example,
can request as many devices as he can handle.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3cdb40e

zram: reorganize code layout · 522698d7

由 Sergey Senozhatsky 提交于 6月 25, 2015

This patch looks big, but basically it just moves code blocks.
No functional changes.

Our current code layout looks like a sandwitch.

For example,
a) between read/write handlers, we have update_used_max() helper function:

static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw

b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:

static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request

c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.

Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):

-- one-liners: zram_test_flag/etc.

-- helpers: is_partial_io/update_position/etc

-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.

-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page

-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page

-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

522698d7

zram: use idr instead of `zram_devices' array · 85508ec6

由 Sergey Senozhatsky 提交于 6月 25, 2015

This patch makes some preparations for on-demand device add/remove
functionality.

Remove `zram_devices' array and switch to id-to-pointer translation (idr).
idr doesn't bloat zram struct with additional members, f.e.  list_head,
yet still provides ability to match the device_id with the device pointer.

No user-space visible changes.

[Julia.Lawall@lip6.fr: return -ENOMEM when `queue' alloc fails]
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85508ec6

zram: cosmetic ZRAM_ATTR_RO code formatting tweak · 3bca3ef7

由 Sergey Senozhatsky 提交于 6月 25, 2015

Fix a misplaced backslash.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3bca3ef7

zram: remove obsolete ZRAM_DEBUG option · 9e65bf68

由 Marcin Jabrzyk 提交于 6月 25, 2015

This config option doesn't provide any usage for zram.
Signed-off-by: NMarcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e65bf68

25 6月, 2015 7 次提交

rbd: queue_depth map option · b5584180

由 Ilya Dryomov 提交于 6月 23, 2015

nr_requests (/sys/block/rbd<id>/queue/nr_requests) is pretty much
irrelevant in blk-mq case because each driver sets its own max depth
that it can handle and that's the number of tags that gets preallocated
on setup.  Users can't increase queue depth beyond that value via
writing to nr_requests.

For rbd we are happy with the default BLKDEV_MAX_RQ (128) for most
cases but we want to give users the opportunity to increase it.
Introduce a new per-device queue_depth option to do just that:

    $ sudo rbd map -o queue_depth=1024 ...
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

b5584180

rbd: store rbd_options in rbd_device · d147543d

由 Ilya Dryomov 提交于 6月 22, 2015

Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d147543d

rbd: terminate rbd_opts_tokens with Opt_err · 210c104c

由 Ilya Dryomov 提交于 6月 22, 2015

Also nuke useless Opt_last_bool and don't break lines unnecessarily.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

210c104c

rbd: bump queue_max_segments · d3834fef

由 Ilya Dryomov 提交于 6月 12, 2015

The default queue_limits::max_segments value (BLK_MAX_SEGMENTS = 128)
unnecessarily limits bio sizes to 512k (assuming 4k pages).  rbd, being
a virtual block device, doesn't have any restrictions on the number of
physical segments, so bump max_segments to max_hw_sectors, in theory
allowing a sector per segment (although the only case this matters that
I can think of is some readv/writev style thing).  In practice this is
going to give us 1M bios - the number of segments in a bio is limited
in bio_get_nr_vecs() by BIO_MAX_PAGES = 256.

Note that this doesn't result in any improvement on a typical direct
sequential test.  This is because on a box with a not too badly
fragmented memory the default BLK_MAX_SEGMENTS is enough to see nice
rbd object size sized requests.  The only difference is the size of
bios being merged - 512k vs 1M for something like

    $ dd if=/dev/zero of=/dev/rbd0 oflag=direct bs=$RBD_OBJ_SIZE
    $ dd if=/dev/rbd0 iflag=direct of=/dev/null bs=$RBD_OBJ_SIZE
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d3834fef

rbd: timeout watch teardown on unmap with mount_timeout · 2894e1d7

由 Ilya Dryomov 提交于 5月 12, 2015

As part of unmap sequence, kernel client has to talk to the OSDs to
teardown watch on the header object. If none of the OSDs are available
it would hang forever, until interrupted by a signal - when that
happens we follow through with the rest of unmap procedure (i.e.
unregister the device and put all the data structures) and the unmap is
still considired successful (rbd cli tool exits with 0). The watch on
the userspace side should eventually timeout so that's fine.

This isn't very nice, because various userspace tools (pacemaker rbd
resource agent, for example) then have to worry about setting up their
own timeouts. Timeout it with mount_timeout (60 seconds by default).
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NSage Weil <sage@redhat.com>

2894e1d7

libceph: store timeouts in jiffies, verify user input · a319bf56

由 Ilya Dryomov 提交于 5月 15, 2015

There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

a319bf56

Y
libceph: allow setting osd_req_op's flags · 144cba14
由 Yan, Zheng 提交于 4月 27, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
144cba14

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功