提交 · 6cf7677f1a94546e472658290b3b8bdbb16cc045 · openanolis / cloud-kernel

09 2月, 2017 1 次提交

block: move req_set_nomerge to blk.h · 6cf7677f

由 Christoph Hellwig 提交于 2月 08, 2017

This makes it available outside of blk-merge.c, and inlining such a trivial
helper seems pretty useful to start with.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6cf7677f

07 2月, 2017 1 次提交

blk-mq-sched: (un)register elevator when (un)registering queue · 80c6b157

由 Omar Sandoval 提交于 2月 06, 2017

I noticed that when booting with a default blk-mq I/O scheduler, the
/sys/block/*/queue/iosched directory was missing. However, switching
after boot did create the directory. This is because we skip the initial
elevator register/unregister when we don't have a ->request_fn(), but we
should still do it for the ->mq_ops case.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

80c6b157

05 2月, 2017 1 次提交

dm: don't allow ioctls to targets that don't map to whole devices · e980f623

由 Christoph Hellwig 提交于 2月 04, 2017

.. at least for unprivileged users.  Before we called into the SCSI
ioctl code to allow excemptions for a few SCSI passthrough ioctls,
but this is pretty unsafe and except for this call dm knows nothing
about SCSI ioctls.

As the SCSI ioctl code is now optional, we really don't want to
drag it in for DM, and the exception is not very useful anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

e980f623

04 2月, 2017 2 次提交

block: free merged request in the caller · e4d750c9

由 Jens Axboe 提交于 2月 03, 2017

If we end up doing a request-to-request merge when we have completed
a bio-to-request merge, we free the request from deep down in that
path. For blk-mq-sched, the merge path has to hold the appropriate
lock, but we don't need it for freeing the request. And in fact
holding the lock is problematic, since we are now calling the
mq sched put_rq_private() hook with the lock held. Other call paths
do not hold this lock.

Fix this inconsistency by ensuring that the caller frees a merged
request. Then we can do it outside of the lock, making it both more
efficient and fixing the blk-mq-sched problem of invoking parts of
the scheduler with an unknown lock state.
Reported-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

e4d750c9

blk-merge: return the merged request · b973cb7e

由 Jens Axboe 提交于 2月 02, 2017

When we attempt to merge request-to-request, we return a 0/1 if we
ended up merging or not. Change that to return the pointer to the
request that we freed. We will use this to move the freeing of
that request out of the merge logic, so that callers can drop
locks before freeing the request.

There should be no functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

b973cb7e

03 2月, 2017 9 次提交

blkcg: fix double free of new_blkg in blkcg_init_queue · 9b54d816

由 Hou Tao 提交于 2月 03, 2017

If blkg_create fails, new_blkg passed as an argument will
be freed by blkg_create, so there is no need to free it again.
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b54d816

blk-mq-sched: bypass the scheduler for flushes entirely · 0cacba6c

由 Omar Sandoval 提交于 2月 02, 2017

There's a weird inconsistency that flushes are mostly hidden from the
scheduler, but it needs to be aware of them in ->insert_requests().
Instead of having every scheduler call blk_mq_sched_bypass_insert(),
let's do it in the common framework.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0cacba6c

zram_drv: update for backing dev info changes · e1735496

由 Jens Axboe 提交于 2月 02, 2017

A previous commit made the bdi embedded in the request queue
a pointer, but neglected to fixup zram. Fix it up.

Fixes: dc3b17cc ("block: Use pointer to backing_dev_info from request_queue")
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e1735496

blktrace: use existing disk debugfs directory · 6ac93117

由 Omar Sandoval 提交于 1月 31, 2017

We may already have a directory to put the blktrace stuff in if

1. The disk uses blk-mq
2. CONFIG_BLK_DEBUG_FS is enabled
3. We are tracing the whole disk and not a partition

Instead of hardcoding this very specific case, let's use the new
debugfs_lookup(). If the directory exists, we use it, otherwise we
create one and clean it up later.

Fixes: 07e4fead ("blk-mq: create debugfs directory tree")
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6ac93117

blk-mq: move debugfs_remove() of disk dir to blk_release_queue() · 62ebce16

由 Omar Sandoval 提交于 1月 31, 2017

This needs to happen after we tear down blktrace.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

62ebce16

block: use same block debugfs directory for blk-mq and blktrace · 18fbda91

由 Omar Sandoval 提交于 1月 31, 2017

When I added the blk-mq debugging information to debugfs, I didn't
notice that blktrace also creates a "block" directory in debugfs. Make
them use the same dentry, now created in the core block code. Based on a
patch from Jens.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

18fbda91

blktrace: make do_blk_trace_setup() static · a428d314

由 Omar Sandoval 提交于 1月 31, 2017

This isn't used outside of blktrace.c anymore.

Fixes: 62c2a7d9 ("block: push BKL into blktrace ioctls")
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a428d314

block: fix debugfs config conditional in struct request_queue · 03796c14

由 Omar Sandoval 提交于 1月 31, 2017

The debugfs dentries are only used for CONFIG_BLK_DEBUG_FS, so make them
conditional on that instead of CONFIG_DEBUG_FS.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

03796c14

debugfs: add debugfs_lookup() · a7c5437b

由 Omar Sandoval 提交于 1月 31, 2017

We don't always have easy access to the dentry of a file or directory we
created in debugfs. Add a helper which allows us to get a dentry we
previously created.

The motivation for this change is a problem with blktrace and the blk-mq
debugfs entries introduced in 07e4fead ("blk-mq: create debugfs
directory tree"). Namely, in some cases, the directory that blktrace
needs to create may already exist, but in other cases, it may not. We
_could_ rely on a bunch of implied knowledge to decide whether to create
the directory or not, but it's much cleaner on our end to just look it
up.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

a7c5437b

02 2月, 2017 12 次提交

scsi, block: fix duplicate bdi name registration crashes · 0dba1314

由 Dan Williams 提交于 2月 01, 2017

Warnings of the following form occur because scsi reuses a devt number
while the block layer still has it referenced as the name of the bdi
[1]:

 WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192'
 [..]
 Call Trace:
  dump_stack+0x86/0xc3
  __warn+0xcb/0xf0
  warn_slowpath_fmt+0x5f/0x80
  ? kernfs_path_from_node+0x4f/0x60
  sysfs_warn_dup+0x62/0x80
  sysfs_create_dir_ns+0x77/0x90
  kobject_add_internal+0xb2/0x350
  kobject_add+0x75/0xd0
  device_add+0x15a/0x650
  device_create_groups_vargs+0xe0/0xf0
  device_create_vargs+0x1c/0x20
  bdi_register+0x90/0x240
  ? lockdep_init_map+0x57/0x200
  bdi_register_owner+0x36/0x60
  device_add_disk+0x1bb/0x4e0
  ? __pm_runtime_use_autosuspend+0x5c/0x70
  sd_probe_async+0x10d/0x1c0
  async_run_entry_fn+0x39/0x170

This is a brute-force fix to pass the devt release information from
sd_probe() to the locations where we register the bdi,
device_add_disk(), and unregister the bdi, blk_cleanup_queue().

Thanks to Omar for the quick reproducer script [2]. This patch survives
where an unmodified kernel fails in a few seconds.

[1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4
[2]: http://marc.info/?l=linux-block&m=148554717109098&w=2

Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Reported-by: NOmar Sandoval <osandov@osandov.com>
Tested-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0dba1314

block: Get rid of blk_get_backing_dev_info() · efa7c9f9

由 Jan Kara 提交于 2月 02, 2017

blk_get_backing_dev_info() is now a simple dereference. Remove that
function and simplify some code around that.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

efa7c9f9

block: Make blk_get_backing_dev_info() safe without open bdev · b1d2dc56

由 Jan Kara 提交于 2月 02, 2017

Currenly blk_get_backing_dev_info() is not safe to be called when the
block device is not open as bdev->bd_disk is NULL in that case. However
inode_to_bdi() uses this function and may be call called from flusher
worker or other writeback related functions without bdev being open
which leads to crashes such as:

[113031.075540] Unable to handle kernel paging request for data at address 0x00000000
[113031.075614] Faulting instruction address: 0xc0000000003692e0
0:mon> t
[c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
[c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
[c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
[c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
[c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
[c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
[c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
[c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
Signed-off-by: NJens Axboe <axboe@fb.com>

b1d2dc56

block: Dynamically allocate and refcount backing_dev_info · d03f6cdc

由 Jan Kara 提交于 2月 02, 2017

Instead of storing backing_dev_info inside struct request_queue,
allocate it dynamically, reference count it, and free it when the last
reference is dropped. Currently only request_queue holds the reference
but in the following patch we add other users referencing
backing_dev_info.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

d03f6cdc

block: Use pointer to backing_dev_info from request_queue · dc3b17cc

由 Jan Kara 提交于 2月 02, 2017

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

dc3b17cc

block: Unhash block device inodes on gendisk destruction · f44f1ab5

由 Jan Kara 提交于 2月 02, 2017

Currently, block device inodes stay around after corresponding gendisk
hash died until memory reclaim finds them and frees them. Since we will
make block device inode pin the bdi, we want to free the block device
inode as soon as the device goes away so that bdi does not stay around
unnecessarily. Furthermore we need to avoid issues when new device with
the same major,minor pair gets created since reusing the bdi structure
would be rather difficult in this case.

Unhashing block device inode on gendisk destruction nicely deals with
these problems. Once last block device inode reference is dropped (which
may be directly in del_gendisk()), the inode gets evicted. Furthermore if
the major,minor pair gets reallocated, we are guaranteed to get new
block device inode even if old block device inode is not yet evicted and
thus we avoid issues with possible reuse of bdi.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

f44f1ab5

nbd: use an idr to keep track of nbd devices · b0d9111a

由 Josef Bacik 提交于 2月 01, 2017

To prepare for dynamically adding new nbd devices to the system switch
from using an array for the nbd devices and instead use an idr.  This
copies what loop does for keeping track of its devices.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b0d9111a

nbd: use our own workqueue for recv threads · 124d6db0

由 Josef Bacik 提交于 2月 01, 2017

Since we are in the memory reclaim path we need our recv work to be on a
workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
set WQ_HIGHPRI since we are in the completion path for IO.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

124d6db0

blk-mq-debug: Introduce debugfs_create_files() · 72f2f8f6

由 Bart Van Assche 提交于 2月 01, 2017

Replace the two debugfs_create_file() loops by a call to the new
debugfs_create_files() function. Add an empty element at the end
of the two attribute arrays such that the array size does not have
to be passed to debugfs_create_files().
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

72f2f8f6

blk-mq-debug: Make show() operations interruptible · 8c0f14ea

由 Bart Van Assche 提交于 2月 01, 2017

Allow users to interrupt show operations instead of making a user
space process unkillable if ownership of q->sysfs_lock cannot be
obtained.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8c0f14ea

blk-mq-debug: Avoid that sparse complains about req_flags_t usage · a1ae0f74

由 Bart Van Assche 提交于 2月 01, 2017

Avoid that sparse reports the following complaints:

block/elevator.c:541:29: warning: incorrect type in assignment (different base types)
block/elevator.c:541:29:    expected bool [unsigned] [usertype] next_sorted
block/elevator.c:541:29:    got restricted req_flags_t

block/blk-mq-debugfs.c:92:54: warning: cast from restricted req_flags_t
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a1ae0f74

blk-mq-debugfs: Add missing __acquires() / __releases() annotations · f3bcb0e6

由 Bart Van Assche 提交于 2月 01, 2017

This patch avoids that sparse complains about lock imbalances.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f3bcb0e6

01 2月, 2017 14 次提交

block: move internal_tag to same cache line as tag · d486f1f2

由 Jens Axboe 提交于 1月 31, 2017

Since we removed cmd_type, we now have a hole in the struct. Move
the internal_tag member to the same cacheline as tag, since we
use them at the same time.

This doesn't fix the hole, just moves it elsewhere.
Signed-off-by: NJens Axboe <axboe@fb.com>

d486f1f2

block: fold cmd_type into the REQ_OP_ space · aebf526b

由 Christoph Hellwig 提交于 1月 31, 2017

Instead of keeping two levels of indirection for requests types, fold it
all into the operations.  The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

aebf526b

ide: don't abuse cmd_type · 2f5a8e80

由 Christoph Hellwig 提交于 1月 31, 2017

Currently the legacy ide driver defines several request types of it's own,
which is in the way of removing that field entirely.

Instead add a type field to struct ide_request and use that to distinguish
the different types of IDE-internal requests.

It's a bit of a mess, but so is the surrounding code..
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@fb.com>

2f5a8e80

block: introduce blk_rq_is_passthrough · 57292b58

由 Christoph Hellwig 提交于 1月 31, 2017

This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

57292b58

nbd: move request validity checking into nbd_send_cmd · 09fc54cc

由 Christoph Hellwig 提交于 1月 31, 2017

This is where we do the rest of the request handling, which will
become much simpler soon, too.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

09fc54cc

nbd: remove REQ_TYPE_DRV_PRIV leftovers · 27410a89

由 Christoph Hellwig 提交于 1月 31, 2017

Disconnects don't use block layer requests these days, so all handling
of private requests is dead code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

27410a89

mspro_block: remove pointless prep_fn · 55460a8a

由 Christoph Hellwig 提交于 1月 31, 2017

This driver will never see non-fs requests, and doesn't do anything
else in the prep_fn.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

55460a8a

ms_block: remove pointless prep_fn · cf22f802

由 Christoph Hellwig 提交于 1月 31, 2017

This driver will never see non-fs requests, and doesn't do anything
else in the prep_fn.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

cf22f802

mmc: remove pointless request type check in mmc_prep_request · 261c83c1

由 Christoph Hellwig 提交于 1月 31, 2017

The block layer won't send requests the driver isn't asking for,
so remove this check.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

261c83c1

ѕd: remove pointless REQ_TYPE_FS check · 68b568c7

由 Christoph Hellwig 提交于 1月 31, 2017

->done can only be called for fs requests, so no need to check again here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

68b568c7

C
scm_blk: remove unneeded REQ_TYPE_FS check · 1dd128a1
由 Christoph Hellwig 提交于 1月 31, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
1dd128a1

virtio_blk: make SCSI passthrough support configurable · 97b50a65

由 Christoph Hellwig 提交于 1月 28, 2017

The SCSI passthrough idea was a a bad idea to start with (guess who came
up with it?), and has been removed from the virtio 1.O spec, and is not
enabled by defauly by any host I know of.  Add a separate config option
for it so that we don't need to enable it for most setups.  That way
any bugs related to it (like the one recently fixed for vmapped stacks)
do not affect other users, and the size of the virtblk_req structure
also shrinks significantly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

97b50a65

virtio_blk: remove struct request backpointer from virtblk_req · 85dada09

由 Christoph Hellwig 提交于 1月 28, 2017

We can simply use blk_mq_rq_from_pdu to get back at the request at
I/O completion time.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

85dada09

block: make scsi_request and scsi ioctl support optional · 72148aec

由 Christoph Hellwig 提交于 1月 28, 2017

We only need this code to support scsi, ide, cciss and virtio.  And at
least for virtio it's a deprecated feature to start with.

This should shrink the kernel size for embedded device that only use,
say eMMC a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

72148aec

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功