提交 · 7dfdbc7367f6f789715cab2cb484b78ab45e9f3e · openanolis / cloud-kernel

09 3月, 2018 4 次提交

block: Protect queue flag changes with the queue lock · 7dfdbc73

由 Bart Van Assche 提交于 3月 07, 2018

Since the queue flags may be changed concurrently from multiple
contexts after a queue becomes visible in sysfs, make these changes
safe by protecting these with the queue lock.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7dfdbc73

block: Introduce blk_queue_flag_{set,clear,test_and_{set,clear}}() · 8814ce8a

由 Bart Van Assche 提交于 3月 07, 2018

Introduce functions that modify the queue flags and that protect
these modifications with the request queue lock. Except for moving
one wake_up_all() call from inside to outside a critical section,
this patch does not change any functionality.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8814ce8a

block: Use the queue_flag_*() functions instead of open-coding these · f78bac2c

由 Bart Van Assche 提交于 3月 07, 2018

Except for changing the atomic queue flag manipulations that are
protected by the queue lock into non-atomic manipulations, this
patch does not change any functionality.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f78bac2c

block: Reorder the queue flag manipulation function definitions · 66f91322

由 Bart Van Assche 提交于 3月 07, 2018

Move the definition of queue_flag_clear_unlocked() up and move the
definition of queue_in_flight() down such that all queue flag
manipulation function definitions become contiguous.

This patch does not change any functionality.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

66f91322

07 3月, 2018 2 次提交

block: sed-opal: fix response string extraction · d15e1175

由 Jonas Rabenstein 提交于 3月 01, 2018

Tokens are prefixed by a variable length of bytes. If a bytestring is
not stored in an tiny or short atom, we have to skip more than one byte
in order to have the actual bytes not prefixed by the bytes describing
the actual length of the string.
Acked-by: NJonathan Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d15e1175

block: null_blk: fix 'Invalid parameters' when loading module · 66231ad3

由 Ming Lei 提交于 3月 06, 2018

On ARM64, the default page size has been 64K on some distributions, and
we should allow ARM64 people to play null_blk.

This patch fixes the issue by extend page bitmap size for supporting
other non-4KB PAGE_SIZE.

Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Kyungchan Koh <kkc6196@fb.com>,
Cc: weiping zhang <zhangweiping@didichuxing.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

66231ad3

01 3月, 2018 17 次提交

staging: rts5208: rename SG_END macro · b5d013bc

由 Arnd Bergmann 提交于 3月 01, 2018

A change to the generic scatterlist code caused a conflict with
the rtsx card reader driver:

In file included from drivers/staging/rts5208/rtsx.h:180,
                 from drivers/staging/rts5208/rtsx.c:28:
drivers/staging/rts5208/rtsx_chip.h:343: error: "SG_END" redefined [-Werror]

This changes one instance of the driver to prefix SG_END and
related constants.

Fixes: 723fbf56 ("lib/scatterlist: Add SG_CHAIN and SG_END macros for LSB encodings")
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b5d013bc

misc: rtsx: rename SG_END macro · f16ee7c7

由 Arnd Bergmann 提交于 3月 01, 2018

A change to the generic scatterlist code caused a conflict with
the rtsx card reader driver:

In file included from drivers/misc/cardreader/rtsx_pcr.c:32:
include/linux/rtsx_pci.h:40: error: "SG_END" redefined [-Werror]

This changes one instance of the driver to prefix SG_END and
related constants.

Fixes: 723fbf56 ("lib/scatterlist: Add SG_CHAIN and SG_END macros for LSB encodings")
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f16ee7c7

block: Fix a race between request queue removal and the block cgroup controller · a063057d

由 Bart Van Assche 提交于 2月 28, 2018

Avoid that the following race can occur:

blk_cleanup_queue()               blkcg_print_blkgs()
  spin_lock_irq(lock) (1)           spin_lock_irq(blkg->q->queue_lock) (2,5)
    q->queue_lock = &q->__queue_lock (3)
  spin_unlock_irq(lock) (4)
                                    spin_unlock_irq(blkg->q->queue_lock) (6)

(1) take driver lock;
(2) busy loop for driver lock;
(3) override driver lock with internal lock;
(4) unlock driver lock;
(5) can take driver lock now;
(6) but unlock internal lock.

This change is safe because only the SCSI core and the NVME core keep
a reference on a request queue after having called blk_cleanup_queue().
Neither driver accesses any of the removed data structures between its
blk_cleanup_queue() and blk_put_queue() calls.
Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a063057d

block: Fix a race between the cgroup code and request queue initialization · 498f6650

由 Bart Van Assche 提交于 2月 28, 2018

Initialize the request queue lock earlier such that the following
race can no longer occur:

blk_init_queue_node()             blkcg_print_blkgs()
  blk_alloc_queue_node (1)
    q->queue_lock = &q->__queue_lock (2)
    blkcg_init_queue(q) (3)
                                    spin_lock_irq(blkg->q->queue_lock) (4)
  q->queue_lock = lock (5)
                                    spin_unlock_irq(blkg->q->queue_lock) (6)

(1) allocate an uninitialized queue;
(2) initialize queue_lock to its default internal lock;
(3) initialize blkcg part of request queue, which will create blkg and
    then insert it to blkg_list;
(4) traverse blkg_list and find the created blkg, and then take its
    queue lock, here it is the default *internal lock*;
(5) *race window*, now queue_lock is overridden with *driver specified
    lock*;
(6) now unlock *driver specified lock*, not the locked *internal lock*,
    unlock balance breaks.

The changes in this patch are as follows:
- Move the .queue_lock initialization from blk_init_queue_node() into
  blk_alloc_queue_node().
- Only override the .queue_lock pointer for legacy queues because it
  is not useful for blk-mq queues to override this pointer.
- For all all block drivers that initialize .queue_lock explicitly,
  change the blk_alloc_queue() call in the driver into a
  blk_alloc_queue_node() call and remove the explicit .queue_lock
  initialization. Additionally, initialize the spin lock that will
  be used as queue lock earlier if necessary.
Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

498f6650

block: Add 'lock' as third argument to blk_alloc_queue_node() · 5ee0524b

由 Bart Van Assche 提交于 2月 28, 2018

This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5ee0524b

zram: Delete gendisk before cleaning up the request queue · 392db380

由 Bart Van Assche 提交于 2月 28, 2018

Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

392db380

md: Delete gendisk before cleaning up the request queue · d8115c35

由 Bart Van Assche 提交于 2月 28, 2018

Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d8115c35

block/loop: Delete gendisk before cleaning up the request queue · 0fa8ebdd

由 Bart Van Assche 提交于 2月 28, 2018

Remove the disk, partition and bdi sysfs attributes before cleaning up
the request queue associated with the disk.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0fa8ebdd

null_blk: add 'requeue' fault attribute · 24941b90

由 Jens Axboe 提交于 2月 28, 2018

Similarly to the support we have for testing/faking timeouts for
null_blk, this adds support for triggering a requeue condition.
Considering the issues around restart we've been seeing, this should be
a useful addition to the testing arsenal to ensure that we are handling
requeue conditions correctly.

This works for queue mode 1 (legacy request_fn based path) and 2 (blk-mq
path), as there's no good way to do requeue with a bio based driver.
This is similar to the timeout path. For the blk-mq path, we alternate
between passing back BLK_STS_RESOURCE and manually calling
blk_mq_requeue_request() in the driver. The former will hit the core
requeue path, while the latter exercises the IO scheduler requeue
path.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

24941b90

sbitmap: use test_and_set_bit_lock()/clear_bit_unlock() · 4ace53f1

由 Omar Sandoval 提交于 2月 27, 2018

sbitmap_queue_get()/sbitmap_queue_clear() are used for
allocating/freeing a resource, so they should provide acquire/release
barrier semantics, respectively. sbitmap_get() currently contains a full
barrier, which is unnecessary, so use test_and_set_bit_lock() instead of
test_and_set_bit() (these are equivalent on x86_64). sbitmap_clear_bit()
does not imply any barriers, which is incorrect, as accesses of the
resource (e.g., request) could potentially get reordered to after the
clear_bit(). Introduce sbitmap_clear_bit_unlock() and use it for
sbitmap_queue_clear() (this only adds a compiler barrier on x86_64). The
other existing user of sbitmap_clear_bit() (the blk-mq software queue
pending map) is serialized through a spinlock and does not need this.
Reported-by: NTejun Heo <tj@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ace53f1

block: clear ctx pending bit under ctx lock · e9a99a63

由 Omar Sandoval 提交于 2月 27, 2018

When we insert a request, we set the software queue pending bit while
holding the software queue lock. However, we clear it outside of the
lock, so it's possible that a concurrent insert could reset the bit
after we clear it but before we empty the request list. Afterwards, the
bit would still be set but the software queue wouldn't have any requests
in it, leading us to do a spurious run in the future. This is mostly a
benign/theoretical issue, but it makes the following change easier to
justify.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9a99a63

blk-mq-debugfs: Show zone locking information · 18bc4230

由 Bart Van Assche 提交于 2月 27, 2018

When debugging the ZBC code in the mq-deadline scheduler it is very
important to know which zones are locked and which zones are not
locked. Hence this patch that exports the zone locking information
through debugfs.

Cc: Omar Sandoval <osandov@fb.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

18bc4230

blk-mq-debugfs: Reorder queue show and store methods · 1209cb7f

由 Bart Van Assche 提交于 2月 27, 2018

Make sure that the queue show and store methods are contiguous and
also that these appear in alphabetical order.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1209cb7f

writeback: remove dead code in wb_blkcg/memcg_offline · 025aecd8

由 Jiufei Xue 提交于 2月 28, 2018

Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

025aecd8

lib/scatterlist: Add SG_CHAIN and SG_END macros for LSB encodings · 723fbf56

由 Anshuman Khandual 提交于 2月 15, 2018

This replaces scatterlist->page_link LSB encodings with SG_CHAIN and
SG_END definitions without any functional change.
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

723fbf56

Merge branch 'for-jens' of git://git.infradead.org/nvme into for-linus · 468f0987

由 Jens Axboe 提交于 2月 28, 2018

Pull NVMe fixes from Keith for 4.16-rc.

* 'for-jens' of git://git.infradead.org/nvme:
  nvmet: fix PSDT field check in command format
  nvme-multipath: fix sysfs dangerously created links
  nvme-pci: Fix nvme queue cleanup if IRQ setup fails
  nvmet-loop: use blk_rq_payload_bytes for sgl selection
  nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes
  nvme-fabrics: don't check for non-NULL module in nvmf_register_transport

468f0987

nvmet: fix PSDT field check in command format · bffd2b61

由 Max Gurtovoy 提交于 1月 24, 2018

PSDT field section according to NVM_Express-1.3:
"This field specifies whether PRPs or SGLs are used for any data
transfer associated with the command. PRPs shall be used for all
Admin commands for NVMe over PCIe. SGLs shall be used for all Admin
and I/O commands for NVMe over Fabrics. This field shall be set to
01b for NVMe over Fabrics 1.0 implementations.
Suggested-by: NIdan Burstein <idanb@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

bffd2b61

28 2月, 2018 4 次提交

nvme-multipath: fix sysfs dangerously created links · 9bd82b1a

由 Baegjae Sung 提交于 2月 28, 2018

If multipathing is enabled, each NVMe subsystem creates a head
namespace (e.g., nvme0n1) and multiple private namespaces
(e.g., nvme0c0n1 and nvme0c1n1) in sysfs. When creating links for
private namespaces, links of head namespace are used, so the
namespace creation order must be followed (e.g., nvme0n1 ->
nvme0c1n1). If the order is not followed, links of sysfs will be
incomplete or kernel panic will occur.

The kernel panic was:
  kernel BUG at fs/sysfs/symlink.c:27!
  Call Trace:
    nvme_mpath_add_disk_links+0x5d/0x80 [nvme_core]
    nvme_validate_ns+0x5c2/0x850 [nvme_core]
    nvme_scan_work+0x1af/0x2d0 [nvme_core]

Correct order
Context A     Context B
nvme0n1
nvme0c0n1     nvme0c1n1

Incorrect order
Context A     Context B
              nvme0c1n1
nvme0n1
nvme0c0n1

The nvme_mpath_add_disk (for creating head namespace) is called
just before the nvme_mpath_add_disk_links (for creating private
namespaces). In nvme_mpath_add_disk, the first context acquires
the lock of subsystem and creates a head namespace, and other
contexts do nothing by checking GENHD_FL_UP of a head namespace
after waiting to acquire the lock. We verified the code with or
without multipathing using three vendors of dual-port NVMe SSDs.
Signed-off-by: NBaegjae Sung <baegjae@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

9bd82b1a

nbd: fix return value in error handling path · 0979962f

由 Gustavo A. R. Silva 提交于 2月 12, 2018

It seems that the proper value to return in this particular case is the
one contained into variable new_index instead of ret.

Addresses-Coverity-ID: 1465148 ("Copy-paste error")
Fixes: e46c7287 ("nbd: add a basic netlink interface")
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0979962f

bcache: fix kcrashes with fio in RAID5 backend dev · 60eb34ec

由 Tang Junhui 提交于 2月 27, 2018

Kernel crashed when run fio in a RAID5 backend bcache device, the call
trace is bellow:
[  440.012034] kernel BUG at block/blk-ioc.c:146!
[  440.012696] invalid opcode: 0000 [#1] SMP NOPTI
[  440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8
[  440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16
/2015
[  440.028615] RIP: 0010:put_io_context+0x8b/0x90
[  440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246
[  440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000
0f4240
[  440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882
94fca0
[  440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8
0c1700
[  440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000
01000
[  440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11
bd1e0
[  440.035239] FS:  0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:
0000000000000000
[  440.060190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001
606e0
[  440.110498] Call Trace:
[  440.135443]  bio_disassociate_task+0x1b/0x60
[  440.160355]  bio_free+0x1b/0x60
[  440.184666]  bio_put+0x23/0x30
[  440.208272]  search_free+0x23/0x40 [bcache]
[  440.231448]  cached_dev_write_complete+0x31/0x70 [bcache]
[  440.254468]  closure_put+0xb6/0xd0 [bcache]
[  440.277087]  request_endio+0x30/0x40 [bcache]
[  440.298703]  bio_endio+0xa1/0x120
[  440.319644]  handle_stripe+0x418/0x2270 [raid456]
[  440.340614]  ? load_balance+0x17b/0x9c0
[  440.360506]  handle_active_stripes.isra.58+0x387/0x5a0 [raid456]
[  440.380675]  ? __release_stripe+0x15/0x20 [raid456]
[  440.400132]  raid5d+0x3ed/0x5d0 [raid456]
[  440.419193]  ? schedule+0x36/0x80
[  440.437932]  ? schedule_timeout+0x1d2/0x2f0
[  440.456136]  md_thread+0x122/0x150
[  440.473687]  ? wait_woken+0x80/0x80
[  440.491411]  kthread+0x102/0x140
[  440.508636]  ? find_pers+0x70/0x70
[  440.524927]  ? kthread_associate_blkcg+0xa0/0xa0
[  440.541791]  ret_from_fork+0x35/0x40
[  440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2
48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b
0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41
[  440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8
[  440.628575] ---[ end trace a1fd79d85643a73e ]--

All the crash issue happened when a bypass IO coming, in such scenario
s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the
s->orig_bio by calling bio_complete(), and after that, s->iop.bio became
invalid, then kernel would crash when calling bio_put(). Maybe its upper
layer's faulty, since bio should not be freed before we calling bio_put(),
but we'd better calling bio_put() first before calling bio_complete() to
notify upper layer ending this bio.

This patch moves bio_complete() under bio_put() to avoid kernel crash.

[mlyle: fixed commit subject for character limits]
Reported-by: NMatthias Ferdinand <bcache@mfedv.net>
Tested-by: NMatthias Ferdinand <bcache@mfedv.net>
Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: NMichael Lyle <mlyle@lyle.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

60eb34ec

bcache: correct flash only vols (check all uuids) · 02aa8a8b

由 Coly Li 提交于 2月 27, 2018

Commit 2831231d ("bcache: reduce cache_set devices iteration by
devices_max_used") adds c->devices_max_used to reduce iteration of
c->uuids elements, this value is updated in bcache_device_attach().

But for flash only volume, when calling flash_devs_run(), the function
bcache_device_attach() is not called yet and c->devices_max_used is not
updated. The unexpected result is, the flash only volume won't be run
by flash_devs_run().

This patch fixes the issue by iterate all c->uuids elements in
flash_devs_run(). c->devices_max_used will be updated properly when
bcache_device_attach() gets called.

[mlyle: commit subject edited for character limit]

Fixes: 2831231d ("bcache: reduce cache_set devices iteration by devices_max_used")
Reported-by: NTang Junhui <tang.junhui@zte.com.cn>
Signed-off-by: NColy Li <colyli@suse.de>
Reviewed-by: NMichael Lyle <mlyle@lyle.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02aa8a8b

27 2月, 2018 8 次提交

blktrace_api.h: fix comment for struct blk_user_trace_setup · 9c722588

由 Eric Biggers 提交于 1月 26, 2018

'struct blk_user_trace_setup' is passed to BLKTRACESETUP, not
BLKTRACESTART.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9c722588

blockdev: Avoid two active bdev inodes for one device · 560e7cb2

由 Jan Kara 提交于 2月 26, 2018

When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:

CPU0					CPU1
blkdev_open()
  bdev = bd_acquire()
					del_gendisk()
					  bdev_unhash_inode(bdev);
					remove device
					create new device with the same number
  __blkdev_get()
    disk = get_gendisk()
      - gets reference to gendisk of the new device

Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.

Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.
Tested-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

560e7cb2

genhd: Fix BUG in blkdev_open() · 56c0908c

由 Jan Kara 提交于 2月 26, 2018

When two blkdev_open() calls for a partition race with device removal
and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
blkdev_open(). The race can happen as follows:

CPU0				CPU1			CPU2
							del_gendisk()
							  bdev_unhash_inode(part1);

blkdev_open(part1, O_EXCL)	blkdev_open(part1, O_EXCL)
  bdev = bd_acquire()		  bdev = bd_acquire()
  blkdev_get(bdev)
    bd_start_claiming(bdev)
      - finds old inode 'whole'
      bd_prepare_to_claim() -> 0
							  bdev_unhash_inode(whole);
							<device removed>
							<new device under same
							 number created>
				  blkdev_get(bdev);
				    bd_start_claiming(bdev)
				      - finds new inode 'whole'
				      bd_prepare_to_claim()
					- this also succeeds as we have
					  different 'whole' here...
					- bad things happen now as we
					  have two exclusive openers of
					  the same bdev

The problem here is that block device opens can see various intermediate
states while gendisk is shutting down and then being recreated.

We fix the problem by introducing new lookup_sem in gendisk that
synchronizes gendisk deletion with get_gendisk() and furthermore by
making sure that get_gendisk() does not return gendisk that is being (or
has been) deleted. This makes sure that once we ever manage to look up
newly created bdev inode, we are also guaranteed that following
get_gendisk() will either return failure (and we fail open) or it
returns gendisk for the new device and following bdget_disk() will
return new bdev inode (i.e., blkdev_open() follows the path as if it is
completely run after new device is created).
Reported-and-analyzed-by: NHou Tao <houtao1@huawei.com>
Tested-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56c0908c

genhd: Fix use after free in __blkdev_get() · 89736653

由 Jan Kara 提交于 2月 26, 2018

When two blkdev_open() calls race with device removal and recreation,
__blkdev_get() can use looked up gendisk after it is freed:

CPU0				CPU1			CPU2
							del_gendisk(disk);
							  bdev_unhash_inode(inode);
blkdev_open()			blkdev_open()
  bdev = bd_acquire(inode);
    - creates and returns new inode
				  bdev = bd_acquire(inode);
				    - returns the same inode
  __blkdev_get(devt)		  __blkdev_get(devt)
    disk = get_gendisk(devt);
      - got structure of device going away
							<finish device removal>
							<new device gets
							 created under the same
							 device number>
				  disk = get_gendisk(devt);
				    - got new device structure
				  if (!bdev->bd_openers) {
				    does the first open
				  }
    if (!bdev->bd_openers)
      - false
    } else {
      put_disk_and_module(disk)
        - remember this was old device - this was last ref and disk is
          now freed
    }
    disk_unblock_events(disk); -> oops

Fix the problem by making sure we drop reference to disk in
__blkdev_get() only after we are really done with it.
Reported-by: NHou Tao <houtao1@huawei.com>
Tested-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89736653

genhd: Add helper put_disk_and_module() · 9df6c299

由 Jan Kara 提交于 2月 26, 2018

Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9df6c299

genhd: Rename get_disk() to get_disk_and_module() · 3079c22e

由 Jan Kara 提交于 2月 26, 2018

Rename get_disk() to get_disk_and_module() to make sure what the
function does. It's not a great name but at least it is now clear that
put_disk() is not it's counterpart.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3079c22e

genhd: Fix leaked module reference for NVME devices · d52987b5

由 Jan Kara 提交于 2月 26, 2018

Commit 8ddcd653 "block: introduce GENHD_FL_HIDDEN" added handling of
hidden devices to get_gendisk() but forgot to drop module reference
which is also acquired by get_disk(). Drop the reference as necessary.

Arguably the function naming here is misleading as put_disk() is *not*
the counterpart of get_disk() but let's fix that in the follow up
commit since that will be more intrusive.

Fixes: 8ddcd653
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d52987b5

direct-io: Fix sleep in atomic due to sync AIO · d9c10e5b

由 Jan Kara 提交于 2月 26, 2018

Commit e864f395 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
way for direct IO to become synchronous and thus trigger fsync from the
IO completion handler. Then commit 9830f4be "fs: Use RWF_* flags for
AIO operations" allowed these flags to be set for AIO as well. However
that commit forgot to update the condition checking whether the IO
completion handling should be defered to a workqueue and thus AIO DIO
with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
sleep in atomic.

Fix the problem by checking directly iocb flags (the same way as it is
done in dio_complete()) instead of checking all conditions that could
lead to IO being synchronous.

CC: Christoph Hellwig <hch@lst.de>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
CC: stable@vger.kernel.org
Reported-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NMark Rutland <mark.rutland@arm.com>
Fixes: 9830f4beSigned-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d9c10e5b

26 2月, 2018 1 次提交

nvme-pci: Fix nvme queue cleanup if IRQ setup fails · f25a2dfc

由 Jianchao Wang 提交于 2月 15, 2018

This patch fixes nvme queue cleanup if requesting an IRQ handler for
the queue's vector fails. It does this by resetting the cq_vector to
the uninitialized value of -1 so it is ignored for a controller reset.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
[changelog updates, removed misc whitespace changes]
Signed-off-by: NKeith Busch <keith.busch@intel.com>

f25a2dfc

25 2月, 2018 2 次提交

block: kyber: fix domain token leak during requeue · ba989a01

由 Ming Lei 提交于 2月 23, 2018

When requeuing request, the domain token should have been freed
before re-inserting the request to io scheduler. Otherwise, the
assigned domain token will be leaked, and IO hang can be caused.

Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: stable@vger.kernel.org
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ba989a01

blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch · 105976f5

由 Ming Lei 提交于 2月 23, 2018

__blk_mq_requeue_request() covers two cases:

- one is that the requeued request is added to hctx->dispatch, such as
blk_mq_dispatch_rq_list()

- another case is that the request is requeued to io scheduler, such as
blk_mq_requeue_request().

We should call io sched's .requeue_request callback only for the 2nd
case.

Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Omar Sandoval <osandov@fb.com>
Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

105976f5

24 2月, 2018 1 次提交

block: pass inclusive 'lend' parameter to truncate_inode_pages_range · 0bd1ed48

由 Ming Lei 提交于 2月 10, 2018

The 'lend' parameter of truncate_inode_pages_range is required to be
inclusive, so follow the rule.

This patch fixes one memory corruption triggered by discard.

Cc: <stable@vger.kernel.org>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Fixes: 351499a1 ("block: Invalidate cache on discard v2")
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bd1ed48

23 2月, 2018 1 次提交

MIPS: boot: Define __ASSEMBLY__ for its.S build · 0f9da844

由 Kees Cook 提交于 2月 22, 2018

The MIPS %.its.S compiler command did not define __ASSEMBLY__, which meant
when compiler_types.h was added to kconfig.h, unexpected things appeared
(e.g. struct declarations) which should not have been present. As done in
the general %.S compiler command, __ASSEMBLY__ is now included here too.

The failure was:

    Error: arch/mips/boot/vmlinux.gz.its:201.1-2 syntax error
    FATAL ERROR: Unable to parse input tree
    /usr/bin/mkimage: Can't read arch/mips/boot/vmlinux.gz.itb.tmp: Invalid argument
    /usr/bin/mkimage Can't add hashes to FIT blob
Reported-by: Nkbuild test robot <lkp@intel.com>
Fixes: 28128c61 ("kconfig.h: Include compiler types to avoid missed struct attributes")
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f9da844

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功