提交 · 8ab761e17efa75449db2d71dc6fabf96d110588c · openanolis / cloud-kernel

30 8月, 2017 14 次提交

drbd: rename "usermode_helper" to "drbd_usermode_helper" · 8ab761e1

由 Greg Kroah-Hartman 提交于 8月 29, 2017

Nothing like having a very generic global variable in a tiny driver
subsystem to make a mess of the global namespace...

Note, there are many other "generic" named global variables in the drbd
subsystem, someone should fix those up one day before they hit a linking
error.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ab761e1

drbd: fix race between handshake and admin disconnect/down · cde81d99

由 Lars Ellenberg 提交于 8月 29, 2017

conn_try_disconnect() could potentialy hit the BUG_ON()
in _conn_set_state() where it iterates over _drbd_set_state()
and "asserts" via BUG_ON() that the latter was successful.

If the STATE_SENT bit was not yet visible to conn_is_valid_transition()
early in _conn_request_state(), but became visible before conn_set_state()
later in that call path, we could hit the BUG_ON() after _drbd_set_state(),
because it returned SS_IN_TRANSIENT_STATE.

To avoid that race, we better protect set_bit(SENT_STATE) with the spinlock.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cde81d99

drbd: fix potential deadlock when trying to detach during handshake · 33d32fa7

由 Lars Ellenberg 提交于 8月 29, 2017

When requesting a detach, we first suspend IO, and also inhibit meta-data IO
by means of drbd_md_get_buffer(), because we don't want to "fail" the disk
while there is IO in-flight: the transition into D_FAILED for detach purposes
may get misinterpreted as actual IO error in a confused endio function.

We wrap it all into wait_event(), to retry in case the drbd_req_state()
returns SS_IN_TRANSIENT_STATE, as it does for example during an ongoing
connection handshake.

In that example, the receiver thread may need to grab drbd_md_get_buffer()
during the handshake to make progress. To avoid potential deadlock with
detach, detach needs to grab and release the meta data buffer inside of
that wait_event retry loop. To avoid lock inversion between
mutex_lock(&device->state_mutex) and drbd_md_get_buffer(device),
introduce a new enum chg_state_flag CS_INHIBIT_MD_IO, and move the
call to drbd_md_get_buffer() inside the state_mutex grabbed in
drbd_req_state().
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

33d32fa7

drbd: A single dot should be put into a sequence. · 427fd2be

由 Markus Elfring 提交于 8月 29, 2017

Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

427fd2be

drbd: fix rmmod cleanup, remove _all_ debugfs entries · 3f1a1b7c

由 Lars Ellenberg 提交于 8月 29, 2017

If there are still resources defined, but "empty", no more volumes
or connections configured, they don't hold module reference counts,
so rmmod is possible.

To avoid DRBD leftovers in debugfs, we need to call our global
drbd_debugfs_cleanup() only after all resources have been cleaned up.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f1a1b7c

drbd: Use setup_timer() instead of init_timer() to simplify the code. · be7445a3

由 Geliang Tang 提交于 8月 29, 2017

Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be7445a3

drbd: fix potential get_ldev/put_ldev refcount imbalance during attach · 7c752ed3

由 Lars Ellenberg 提交于 8月 29, 2017

Race:

drbd_adm_attach()               | async drbd_md_endio()
                                |
device->ldev is still NULL.     |
                                |
drbd_md_read(                   |
 .endio = drbd_md_endio;        |
 submit;                        |
 ....                           |
 wait for done == 1;            |       done = 1;
);                              |       wake_up();
.. lot of other stuff,          |
.. includeing taking and        |
...giving up locks,             |
.. doing further IO,            |
.. stuff that takes "some time" |
                                | while in this context,
                                | this is the next statement.
                                | which means this context was scheduled
.. only then, finally,          | away for "some time".
device->ldev = nbc;             |
                                |       if (device->ldev)
                                |               put_ldev()

Unlikely, but possible. I was able to provoke it "reliably"
by adding an mdelay(500); after the wake_up().
Fixed by moving the if (!NULL) put_ldev() before done = 1;

Impact of the bug was that the resulting refcount imbalance
could lead to premature destruction of the object, potentially
causing a NULL pointer dereference during a subsequent detach.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c752ed3

drbd: new disk-option disable-write-same · 9de7e14a

由 Lars Ellenberg 提交于 8月 29, 2017

Some backend devices claim to support write-same,
but would fail actual write-same requests.

Allow to set (or toggle) whether or not DRBD tries to support write-same.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9de7e14a

drbd: Fix resource role for newly created resources in events2 · c200d986

由 Philipp Reisner 提交于 8月 29, 2017

The conn_higest_role() (a terribly misnamed function) returns
the role of the resource. It returned R_UNKNOWN as long as the
resource has not a single device.

Resources without devices are short living objects.

But it matters for the NOTIFY_CREATE netwlink message. It makes
a lot more sense to report R_SECONDARY for the newly created
resource than R_UNKNOWN.

I reviewd all call sites of conn_highest_role(), that change
does not matter for the other call sites.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c200d986

drbd: mark symbols static where possible · 1ffa7bfa

由 Baoyou Xie 提交于 8月 29, 2017

We get a few warnings when building kernel with W=1:
drbd/drbd_receiver.c:1224:6: warning: no previous prototype for 'one_flush_endio' [-Wmissing-prototypes]
drbd/drbd_req.c:1450:6: warning: no previous prototype for 'send_and_submit_pending' [-Wmissing-prototypes]
drbd/drbd_main.c:924:6: warning: no previous prototype for 'assign_p_sizes_qlim' [-Wmissing-prototypes]
....

In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.
Signed-off-by: NBaoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1ffa7bfa

drbd: Send P_NEG_ACK upon write error in protocol != C · e1fbc4ca

由 Lars Ellenberg 提交于 8月 29, 2017

In protocol != C, we forgot to send the P_NEG_ACK for failing writes.

Once we no longer submit to local disk, because we already "detached",
due to the typical "on-io-error detach;" config setting,
we already send the neg acks right away.

Only those requests that have been submitted,
and have been error-completed by the local disk,
would forget to send the neg-ack,
and only in asynchronous replication (protocol != C).
Unless this happened during resync,
where we already always send acks, regardless of protocol.

The primary side needs the P_NEG_ACK in order to mark
the affected block(s) for resync in its out-of-sync bitmap.

If the blocks in question are not re-written again,
we may miss to resync them later, causing data inconsistencies.

This patch will always send the neg-acks, and also at least try to
persist the out-of-sync status on the local node already.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e1fbc4ca

drbd: add explicit plugging when submitting batches · de6978be

由 Lars Ellenberg 提交于 8月 29, 2017

When submitting batches of requests which had been queued on the
submitter thread, typically because they needed to wait for an
activity log transactions, use explicit plugging to help potential
merging of requests in the backend io-scheduler.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de6978be

drbd: change list_for_each_safe to while(list_first_entry_or_null) · 9da10e8d

由 Lars Ellenberg 提交于 8月 29, 2017

Two instances of list_for_each_safe can drop their tmp element, they
really just peel off each element in turn from the start of the list.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9da10e8d

drbd: introduce drbd_recv_header_maybe_unplug · c51a0ef3

由 Lars Ellenberg 提交于 8月 29, 2017

Recently, drbd_recv_header() was changed to potentially
implicitly "unplug" the backend device(s), in case there
is currently nothing to receive.

Be more explicit about it: re-introduce the original drbd_recv_header(),
and introduce a new drbd_recv_header_maybe_unplug() for use by the
receiver "main loop".

Using explicit plugging via blk_start_plug(); blk_finish_plug();
really helps the io-scheduler of the backend with merging requests.

Wrap the receiver "main loop" with such a plug.
Also catch unplug events on the Primary,
and try to propagate.

This is performance relevant.  Without this, if the receiving side does
not merge requests, number of IOPS on the peer can me significantly
higher than IOPS on the Primary, and can easily become the bottleneck.

Together, both changes should help to reduce the number of IOPS
as seen on the backend of the receiving side, by increasing
the chance of merging mergable requests, without trading latency
for more throughput.
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c51a0ef3

29 8月, 2017 5 次提交

skd: Let the block layer core choose .nr_requests · 6fd5b91d

由 Bart Van Assche 提交于 8月 29, 2017

Since blk_mq_init_queue() initializes .nr_requests to the tag set
size and since that value is a good default for the skd driver, do
not overwrite the value set by blk_mq_init_queue(). This change
doubles the default value of .nr_requests.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6fd5b91d

skd: Remove blk_queue_bounce_limit() call · bf231981

由 Bart Van Assche 提交于 8月 29, 2017

Since sTec s1120 devices support 64-bit DMA it is not necessary
to request data buffer bouncing. Hence remove the
blk_queue_bounce_limit() call.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf231981

nbd: make device_attribute const · dfbde552

由 Bhumika Goyal 提交于 8月 21, 2017

Make this const as is is only passed as an argument to the
function device_create_file and device_remove_file and the corresponding
arguments are of type const.
Done using Coccinelle
Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dfbde552

null_blk: use available 'dev' in nullb_device_power_store() · b3c30512

由 Jens Axboe 提交于 8月 28, 2017

We already have this pointer, no need to use to_nullb_device()
again.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b3c30512

block/nullb: delete unnecessary memory free · 060fd198

由 Shaohua Li 提交于 8月 28, 2017

Commit 2984c868(nullb: factor disk parameters) has a typo. The
nullb_device allocation/free is done outside of null_add_dev. The commit
accidentally frees the nullb_device in error code path.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

060fd198

26 8月, 2017 6 次提交

skd: Remove SKD_ID_INCR · f5cb2d51

由 Bart Van Assche 提交于 8月 25, 2017

The SKD_ID_INCR flag in skd_request_context.id duplicates information
that is already available otherwise, e.g. through the block layer
request state and through skd_request_context.state. Hence remove
the code that manipulates this flag and also the flag itself.
Since skd_isr_completion_posted() only uses the lower bits of
skd_request_context.id as hardware tag, this patch does not change
the behavior of the skd driver. I'm referring to the following code:

    tag = req_id & SKD_ID_SLOT_AND_TABLE_MASK;
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5cb2d51

skd: Make it easier for static analyzers to analyze skd_free_disk() · 4633504c

由 Bart Van Assche 提交于 8月 25, 2017

Although it is easy to see that skdev->disk != NULL if skdev->queue
!= NULL, add a test for skdev->disk to avoid that smatch reports the
following warning:

drivers/block/skd_main.c:3080 skd_free_disk()
         error: we previously assumed 'disk' could be null (see line 3074)
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4633504c

skd: Inline skd_end_request() · 795bc1b5

由 Bart Van Assche 提交于 8月 25, 2017

It is not worth to keep the debug statements in skd_end_request().
Without debug statements that function only consists of two
statements. Hence inline skd_end_request().
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

795bc1b5

skd: Rename skd_softirq_done() into skd_complete_rq() · 296cb94c

由 Bart Van Assche 提交于 8月 25, 2017

The latter name follows more closely the function names used in
other blk-mq drivers.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

296cb94c

block/nullb: fix NULL dereference · 0d06a42f

由 Shaohua Li 提交于 8月 25, 2017

Dan reported this:

The patch 2984c868: "nullb: factor disk parameters" from Aug 14,
2017, leads to the following Smatch complaint:

drivers/block/null_blk.c:1759 null_init_tag_set()
	 error: we previously assumed 'nullb' could be null (see line
1750)

  1755		set->cmd_size	= sizeof(struct nullb_cmd);
  1756		set->flags = BLK_MQ_F_SHOULD_MERGE;
  1757		set->driver_data = NULL;
  1758
  1759		if (nullb->dev->blocking)
                    ^^^^^^^^^^^^^^^^^^^^
And an unchecked dereference.

nullb could be NULL here.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d06a42f

null_blk: update email adress · 231b3db1

由 Jens Axboe 提交于 8月 25, 2017

Update to a working one, the fusionio address hasn't been valid
in 4 years.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

231b3db1

24 8月, 2017 5 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

skd: Change default interrupt mode to MSI-X · 744353b6

由 Bart Van Assche 提交于 8月 23, 2017

Since MSI support on some motherboards is unreliable, change the
default interrupt mode from MSI to MSI-X. This patch avoids that
the following message appears sporadially in the kernel logs of
my test setup:

do_IRQ: 3.193 No irq handler for vector
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

744353b6

skd: Avoid double completions in case of a timeout · f2fe4459

由 Bart Van Assche 提交于 8月 23, 2017

Avoid that normal request completion and the timeout handler can
run concurrently by calling blk_mq_complete_request() instead of
blk_mq_end_request() from skd_end_request(). Avoid that the block
layer can reuse a request while the firmware is still processing
it. Convert skd_softirq_done() to blk-mq. Pass the pointer to
skd_softirq_done() to the block layer core through
blk_mq_ops.complete instead of by calling blk_queue_softirq_done().
Pass the pointer to skd_timed_out() to the block layer core
through blk_mq_ops.timeout instead of by calling
blk_queue_timed_out(). The timeout handler has been tested as
follows:

    echo 1 > /sys/block/skd0/io-timeout-fail &&
    (cd /sys/kernel/debug/fail_io_timeout &&
      echo 100 > probability &&
      echo N > task-filter &&
      echo 1 > times)

Fixes: commit a74d5b76 ("skd: Switch to block layer timeout mechanism")
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f2fe4459

skd: Inline skd_process_request() · c39c6c77

由 Bart Van Assche 提交于 8月 23, 2017

This patch does not change any functionality but makes the skd
driver code more similar to that of other blk-mq kernel drivers.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c39c6c77

skd: Report completion mismatches once · 49f16e2f

由 Bart Van Assche 提交于 8月 23, 2017

This patch removes one debug statement but otherwise does not change
any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49f16e2f

23 8月, 2017 10 次提交

nullb: badbblocks support · 2f54a613

由 Shaohua Li 提交于 8月 14, 2017

Sometime disk could have tracks broken and data there is inaccessable,
but data in other parts can be accessed in normal way. MD RAID supports
such disks. But we don't have a good way to test it, because we can't
control which part of a physical disk is bad. For a virtual disk, this
can be easily controlled.

This patch adds a new 'badblock' attribute. Configure it in this way:
echo "+1-100" > xxx/badblock, this will make sector [1-100] as bad
blocks.
echo "-20-30" > xxx/badblock, this will make sector [20-30] good

If badblocks are accessed, the nullb disk will return IO error. Other
parts of the disk can accessed in normal way.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2f54a613

nullb: emulate cache · deb78b41

由 Shaohua Li 提交于 8月 14, 2017

Software must flush disk cache to guarantee data safety. To check if
software correctly does disk cache flush, we must know the behavior of
disk. But physical disk behavior is uncontrollable. Even software
doesn't do the flush, the disk probably does the flush. This patch tries
to emulate a cache in the test disk.

All write will go to a cache first, when the cache is full, we then
flush some data to disk storage. A flush request will flush all data of
the cache to disk storage. A FUA write will write to memory store
directly and revalidate data in cache. If there is a power failure (by
writing to power attribute, 'echo 0 > disk_name/power'), we discard all
data in the cache, but preserve the data in disk storage. Later we can
power on the disk again as usual (write 1 to 'power' attribute), then we
can check data integrity and very if software does everything correctly.

A new attribute 'cache_size' (in MB) is added to configure cache size.

Based on original patch from Kyungchan Koh
Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

deb78b41

nullb: bandwidth control · eff2c4f1

由 Shaohua Li 提交于 8月 14, 2017

In test, we usually expect controllable disk speed. For example, in a
raid array, we'd like some disks are fast and some are slow. MD RAID
actually has a feature for this. To test the feature, we'd like to make
the disk run in specific speed.

block throttling probably can be used for this purpose, but it requires
cgroup setup. Here we just implement a simple throttling mechanism in
the driver. There is slight fluctuation in the mechanism, but it's good
enough for test.

To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is
MB/s.

Based on original patch from Kyungchan Koh
Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eff2c4f1

nullb: support discard · 306eb6b4

由 Shaohua Li 提交于 8月 14, 2017

discard makes sense for memory backed disk. And also it's useful to test
if upper layer supports dicard correctly.

User configures 'discard' attribute to enable/disable dicard support.

Based on original patch from Kyungchan Koh
Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

306eb6b4

nullb: support memory backed store · 5bcd0e0c

由 Shaohua Li 提交于 8月 14, 2017

This adds memory backed store in nullb.

User configure 'memory_backed' attribute for this. By default, nullb
disk doesn't use memory backed store.

Based on original patch from Kyungchan Koh
Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5bcd0e0c

nullb: use ida to manage index · 94bc02e3

由 Shaohua Li 提交于 8月 14, 2017

We now dynamically create disks. Managing the disk index with ida to
avoid bump up the index too much.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

94bc02e3

nullb: add interface to power on disk · cedcafad

由 Shaohua Li 提交于 8月 14, 2017

The device created in nullb configfs interface isn't power on by
default. After user configures the device, user can do 'echo 1 >
xxx/nullb/device_name/power' to power on the device, which will create a
disk. the xxx/nullb/device_name/index is the disk index, so if the index
is 2, the new created disk should be named as /dev/nullb2. Note, the
'index' is only valid after disk is power on.

'echo 0 > xxx/nullb/device_name/power' will remove the disk. Note, this
doesn't remove the device. To remove the device, user should do 'rmdir
xxx/nullb/device_name'. Removing the device will remove the disk too.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cedcafad

nullb: add configfs interface · 3bf2bd20

由 Shaohua Li 提交于 8月 14, 2017

Add configfs interface for nullb. configfs interface is more flexible
and easy to configure in a per-disk basis.

Configuration is something like this:
mount -t configfs none /mnt

Checking which features the driver supports:
cat /mnt/nullb/features

The 'features' attribute is for future extension. We probably will add
new features into the driver, userspace can check this attribute to find
the supported features.

Create/remove a device:
mkdir/rmdir /mnt/nullb/a

Then configure the device by setting attributes under /mnt/nullb/a, most
of nullb supported module parameters are converted to attributes:
size; /* device size in MB */
completion_nsec; /* time in ns to complete a request */
submit_queues; /* number of submission queues */
home_node; /* home node for the device */
queue_mode; /* block interface */
blocksize; /* block size */
irqmode; /* IRQ completion handler */
hw_queue_depth; /* queue depth */
use_lightnvm; /* register as a LightNVM device */
blocking; /* blocking blk-mq device */
use_per_node_hctx; /* use per-node allocation for hardware context */

Note, creating a device doesn't create a disk immediately. Creating a
disk is done in two phases: create a device and then power on the
device. Next patch will introduce device power on.

Based on original patch from Kyungchan Koh
Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3bf2bd20

nullb: factor disk parameters · 2984c868

由 Shaohua Li 提交于 8月 14, 2017

When we switch to configfs interface, each disk could have different
configuration. To prepare for the change, we move most disk setting to a
separate data structure. The existing module parameter interface is
kept. The 'nr_devices' and 'shared_tags' don't make sense for per-disk
setting, so they are remained as global settings.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2984c868

skd: error pointer dereference in skd_cons_disk() · 92d499d4

由 Dan Carpenter 提交于 8月 23, 2017

My initial impulse was to check for IS_ERR_OR_NULL() but when I looked
at this code a bit more closely, we should only need to check for
IS_ERR().

The blk_mq_alloc_tag_set() returns negative error codes and zero on
success so we can just do an "if (rc) goto err_out;".  It's better to
preserve the error code anyhow.  The blk_mq_init_queue() returns error
pointers on failure, it never returns NULL.  We can also remove the
"q = NULL;" at the start because that's no longer needed.

Fixes: ca33dd92 ("skd: Convert to blk-mq")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

92d499d4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功