- 30 5月, 2014 1 次提交
-
-
由 Ming Lei 提交于
Firstly, it isn't necessary to hold lock of vblk->vq_lock when notifying hypervisor about queued I/O. Secondly, virtqueue_notify() will cause world switch and it may take long time on some hypervisors(such as, qemu-arm), so it isn't good to hold the lock and block other vCPUs. On arm64 quad core VM(qemu-kvm), the patch can increase I/O performance a lot with VIRTIO_RING_F_EVENT_IDX enabled: - without the patch: 14K IOPS - with the patch: 34K IOPS fio script: [global] direct=1 bsrange=4k-4k timeout=10 numjobs=4 ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f1] rw=randread Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: NMing Lei <ming.lei@canonical.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org # 3.13+ Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 5月, 2014 4 次提交
-
-
由 Valentin Priescu 提交于
Currently xenwatch blocks in VBD disconnect, waiting for all pending I/O requests to finish. If the VBD is attached to a hot-swappable disk, then xenwatch can hang for a long period of time, stalling other watches. INFO: task xenwatch:39 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ffff880057f01bd0 0000000000000246 ffff880057f01ac0 ffffffff810b0782 ffff880057f01ad0 00000000000131c0 0000000000000004 ffff880057edb040 ffff8800344c6080 0000000000000000 ffff880058c00ba0 ffff880057edb040 Call Trace: [<ffffffff810b0782>] ? irq_to_desc+0x12/0x20 [<ffffffff8128f761>] ? list_del+0x11/0x40 [<ffffffff8147a080>] ? wait_for_common+0x60/0x160 [<ffffffff8147bcef>] ? _raw_spin_lock_irqsave+0x2f/0x50 [<ffffffff8147bd49>] ? _raw_spin_unlock_irqrestore+0x19/0x20 [<ffffffff8147a26a>] schedule+0x3a/0x60 [<ffffffffa018fe6a>] xen_blkif_disconnect+0x8a/0x100 [xen_blkback] [<ffffffff81079f70>] ? wake_up_bit+0x40/0x40 [<ffffffffa018ffce>] xen_blkbk_remove+0xae/0x1e0 [xen_blkback] [<ffffffff8130b254>] xenbus_dev_remove+0x44/0x90 [<ffffffff81345cb7>] __device_release_driver+0x77/0xd0 [<ffffffff81346488>] device_release_driver+0x28/0x40 [<ffffffff813456e8>] bus_remove_device+0x78/0xe0 [<ffffffff81342c9f>] device_del+0x12f/0x1a0 [<ffffffff81342d2d>] device_unregister+0x1d/0x60 [<ffffffffa0190826>] frontend_changed+0xa6/0x4d0 [xen_blkback] [<ffffffffa019c252>] ? frontend_changed+0x192/0x650 [xen_netback] [<ffffffff8130ae50>] ? cmp_dev+0x60/0x60 [<ffffffff81344fe4>] ? bus_for_each_dev+0x94/0xa0 [<ffffffff8130b06e>] xenbus_otherend_changed+0xbe/0x120 [<ffffffff8130b4cb>] frontend_changed+0xb/0x10 [<ffffffff81309c82>] xenwatch_thread+0xf2/0x130 [<ffffffff81079f70>] ? wake_up_bit+0x40/0x40 [<ffffffff81309b90>] ? xenbus_directory+0x80/0x80 [<ffffffff810799d6>] kthread+0x96/0xa0 [<ffffffff81485934>] kernel_thread_helper+0x4/0x10 [<ffffffff814839f3>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8147c17c>] ? retint_restore_args+0x5/0x6 [<ffffffff81485930>] ? gs_change+0x13/0x13 With this patch, when there is still pending I/O, the actual disconnect is done by the last reference holder (last pending I/O request). In this case, xenwatch doesn't block indefinitely. Signed-off-by: NValentin Priescu <priescuv@amazon.com> Reviewed-by: NSteven Kady <stevkady@amazon.com> Reviewed-by: NSteven Noonan <snoonan@amazon.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Olaf Hering 提交于
Newer toolstacks may provide a boolean property "discard-enable" in the backend node. Its purpose is to disable discard for file backed storage to avoid fragmentation. Recognize this setting also for physical storage. If that property exists and is false, do not advertise "feature-discard" to the frontend. Signed-off-by: NOlaf Hering <olaf@aepfle.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Olaf Hering 提交于
In its initial implementation a check for "type" was added, but only phy and file are handled. This breaks advertised discard support for other type values such as qdisk. Fix and simplify this function: If the backend advertises discard support it is supposed to implement it properly, so enable feature_discard unconditionally. If the backend advertises the need for a certain granularity and alignment then propagate both properties to the blocklayer. The discard-secure property is a boolean, update the code to reflect that. Signed-off-by: NOlaf Hering <olaf@aepfle.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Christoph Hellwig 提交于
There is no need for drivers to control hardware context allocation now that we do the context to node mapping in common code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 5月, 2014 2 次提交
-
-
由 Jiri Kosina 提交于
Commit 41a55b4d ("floppy: silence warning during disk test") caused bio.bi_flags being overwritten, and its initialization to BIO_UPTODATE in bio_init() to be lost. This was unnoticed until 7b7b68bb ("floppy: bail out in open() if drive is not responding to block0 read"), because the error value wasn't checked for in the bio completion callback. Now we are actually looking at the error, and the loss of BIO_UPTODATE causes EIO to be wrongly passed to the callback, which confuses the FD_OPEN_SHOULD_FAIL_BIT logic. Fix this by not destroying previous value of bi_flags when setting BIO_QUIET. Cc: Stephen Hemminger <shemminger@vyatta.com> Reported-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Jens Axboe 提交于
Drivers currently have to figure this out on their own, and they are missing information to do it properly. The ones that did attempt to do it, do it wrong. So just pass in the suggested node directly to the alloc function. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 5月, 2014 1 次提交
-
-
由 Asai Thambi S P 提交于
Move error handling to service thread, and use mtip_set_timeout() to set timeouts for HDIO_DRIVE_TASK and HDIO_DRIVE_CMD IOCTL commands. Signed-off-by: NSelvan Mani <smani@micron.com> Signed-off-by: NAsai Thambi S P <asamymuthupa@micron.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 5月, 2014 1 次提交
-
-
由 Ming Lei 提交于
When there isn't enough vring descriptor for adding to vq, blk-mq will be put as stopped state until some of pending descriptors are completed & freed. Unfortunately, the vq's interrupt may come just before blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will still be kept as stopped even though lots of descriptors are completed and freed in the interrupt handler. The worst case is that all pending descriptors are freed in the interrupt handler, and the queue is kept as stopped forever. This patch fixes the problem by starting/stopping blk-mq with holding vq_lock. Cc: Jens Axboe <axboe@kernel.dk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 5月, 2014 3 次提交
-
-
由 Jens Axboe 提交于
We need to stop the block layer queues to prevent new "normal" IO from entering the driver, while we wait for existing commands to finish. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan Carpenter 提交于
We changed this from blk_alloc_queue_node() to blk_mq_init_queue() so the check needs to be updated as well. Fixes: ffc771b3 ('mtip32xx: convert to use blk-mq') Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
This rips out timeout handling, requeueing, etc in converting it to use blk-mq instead. Acked-by: NAsai Thambi S P <asamymuthupa@micron.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 5月, 2014 12 次提交
-
-
由 Joe Perches 提交于
Move the function to the proper spot instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Move the function to appropriate locations instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Move function to proper location instead. Fix whitespace and embedded if too. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Move the function to the right spot instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Move the function instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Neaten the spacing too. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
It's defined below without being called. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
The actual static is defined below it but not used until later. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Move static function to the appropriate place to remove the now unnecessary prototype. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Macros with hidden control flow aren't nice. Just use copy_to/from_user directly instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
It's unused, make it disappear. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
It's a debugging message, mark it so. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 5月, 2014 16 次提交
-
-
由 Ming Lei 提交于
entry(cmd->ll_list) may belong to new request once end_cmd() returns, so fix the bug with the patch. Without the change, it is easy to observe oops when doing null_blk(timer) test. Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
If there are no peer_devices or connections, I'd rather have NULL than some "arbitrary" address pretending to point to a struct. Helps to avoid hard to debug symptoms, in case we ever try to use and dereference a drbd_connection or drbd_peer_device where we in fact don't have any connection at all. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
A newly created device was never exposed before, i.e. has a exposed_data_uuid of 0. Then it is valid to attach to any current_uuid of a backing device (of course also to a newly created one (4)) Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
In case a connection transitions into C_TIMEOUT within the timer function (request_timer_fn()) we need to make sure that the receiver thread (potentially running on a different CPU) sees the updated cstate later on. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
...instead directly assigning to q->limits.discard_zeroes_data Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Just because it is the oldest not yet completed request does not make it the oldest request waiting for disk. Or waiting for the peer. And we completely missed already completed requests that would still hold references to activity log extents, waiting only for the barrier ack. Find two oldest not yet completely processed requests, one that is still waiting for local completion, and one that is still waiting for some response from the peer. These may or may not be the same request object. Then separately apply the network and disk timeouts, respectively. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
In the implementation as it was, the two peers sent each other a challenge, and expects the challenge hashed with the shared secret back. A attacker could simply wait for the challenge of the peer, and send the same challenge back. Then it waits for the response, and sends the same response back. Prevent this by not accepting a challenge from the peer that is the same as the challenge sent to the peer. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Once our sender thread needs to wait_for_work(), and actually needs to schedule(), just before we do that, we already check if it is useful to implicitly close the last epoch. The condition was too strict: only implicitly close the epoch, if there have been no new (write) requests at all. The assumption was that if there were new requests, they would always be communicated one way or another, and would send necessary epoch separating barriers explicitly. This is not always true, e.g. when becoming diskless, or while explicitly starting a full resync. The last communicated epoch could stay open for a long time, locking down corresponding activity log extents. It is safe to always implicitly send that last barrier, as soon as we determin that there cannot be more requests in the last communicated epoch, even if there have been (uncommunicated) new requests in new epochs meanwhile. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
When batching more updates to the activity log into single transactions, we lost the ability for new requests to force themselves into the active set: all preparation steps became non-blocking, and if all currently hot extents keep busy, they could starve out new incoming requests to cold extents for quite a while. This can only happen if your IO backend accepts more IO operations per average DRBD replication round trip time than you have al-extents configured. If we have incoming requests to cold extents, at least do one blocking update per transaction. In an artificial worst-case workload on SSD with an asynchronous 600 ms replication link, with al-extents = 7 (the minimum we allow), and concurrent full resynch, without this patch, some write requests have been observed to be starved for 40 seconds. With this patch, application observed a worst case latency of twice the replication round trip time. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
We want to store in persistent meta data what the peer DRBD can handle, which, due to spreading requests to multiple bios, may be more than its backing device can handle. Otherwise, if a disconnected Primary temporarily loses access to its local data as well, we may accidentally shrink the max-bio setting, portentially causing already assembled, but not yet processed, application bios to be spuriously failed due to device limits. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
In the drbd make request function, specifically in drbd_send_and_submit(), we decide whether we want to send the actual write request, or only a "set this block out of sync" information. We do so based on the current connection state, while holding the req_lock. The connection state is not supposed to change while holding the req_lock. But in drbd_start_resync, we did change that state anyways, while only holding the global_state_lock, which is enough to change sync-after dependencies (paused vs active resync), but not good enough to change the connection state. Fix: in drbd_start_resync, first grab the req_lock to serialize with drbd_send_and_submit(), before grabbing the global_state_lock to be able to evaluate the sync-after dependencies. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Allow the user of REQ_DISCARD. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Note that I do NOT call __drbd_chk_io_error for failed REQ_DISCARD. That may be wrong, though, or needs to differ between EOPNOTSUPP and other errors... Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
If the receiver needs to serve a discard request on a queue that does not announce to be discard cabable, it falls back to do synchronous blkdev_issue_zeroout(). We expect only "reasonably" large (up to one activity log extent?) discard requests. We do this to not to not block the receiver for too long in this fallback code path, and to not set/clear too many bits inside one spinlock_irq_save() in drbd_set_in_sync/drbd_set_out_of_sync, Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
We plan to use genl_family->parallel_ops = true in the future, but need to review all possible interactions first. For now, only selectively drop genl_lock() in drbd_set_role(), instead serializing on our own internal resource->conf_update mutex. We now can be promoted/demoted on many resources in parallel, which may significantly improve cluster failover times when fencing is required. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Because all administrative requests via genetlink have been globally serialized via genl_lock(), we used to have one static struct drbd_config_context "admin context". Move this on-stack to the respective callback functions. This will allow us to selectively drop the genl_lock() (or use genl_family->parallel_ops) in the future. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-