提交 · 9d05a96e298aadb36e3ec971fab8d416e6fb7331 · openeuler / Kernel

10 7月, 2019 10 次提交

nvmet: export I/O characteristics attributes in Identify · 9d05a96e

由 Bart Van Assche 提交于 6月 28, 2019

Make the NVMe NAWUN, NAWUPF, NACWU, NPWG, NPWA, NPDG and NOWS attributes
available to initator systems for the block backend.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9d05a96e

nvme-trace: add delete completion and submission queue to admin cmds tracer · 4c0181bf

由 Tom Wu 提交于 7月 04, 2019

The trace log for 'delete I/O submission queue' and 'delete I/O
completion queue' command will look like as below:

kworker/u49:1-3438 [003] .... 6693.070865: nvme_setup_cmd: nvme0: qid=0, cmdid=11, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_sq sqid=1)
kworker/u49:1-3438 [003] .... 6693.071171: nvme_setup_cmd: nvme0: qid=0, cmdid=8, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_cq cqid=24)
Signed-off-by: NTom Wu <tomwu@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4c0181bf

nvme-trace: fix spelling mistake "spcecific" -> "specific" · 91f6d798

由 Colin Ian King 提交于 6月 26, 2019

There are two spelling mistakes in trace_seq_printf messages, fix these.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

91f6d798

nvme-pci: limit max_hw_sectors based on the DMA max mapping size · 7637de31

由 Christoph Hellwig 提交于 7月 03, 2019

When running a NVMe device that is attached to a addressing
challenged PCIe root port that requires bounce buffering, our
request sizes can easily overflow the swiotlb bounce buffer
size.  Limit the maximum I/O size to the limit exposed by
the DMA mapping subsystem.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NAtish Patra <Atish.Patra@wdc.com>
Tested-by: NAtish Patra <Atish.Patra@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

7637de31

nvme-pci: check for NULL return from pci_alloc_p2pmem() · bfac8e9f

由 Alan Mikhak 提交于 7月 08, 2019

Modify nvme_alloc_sq_cmds() to call pci_free_p2pmem() to free the memory
it allocated using pci_alloc_p2pmem() in case pci_p2pmem_virt_to_bus()
returns null.

Makes sure not to call pci_free_p2pmem() if pci_alloc_p2pmem() returned
NULL, which can happen if CONFIG_PCI_P2PDMA is not configured.

The current implementation is not expected to leak since
pci_p2pmem_virt_to_bus() is expected to fail only if pci_alloc_p2pmem()
returns null. However, checking the return value of pci_alloc_p2pmem()
is more explicit.
Signed-off-by: NAlan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfac8e9f

nvme-pci: don't create a read hctx mapping without read queues · 0298d543

由 Alan Mikhak 提交于 7月 08, 2019

Only request an IRQ mapping for read queues if at least one read queue
is being allocted, as nvme_pci_map_queues() will later on ignore the
unnecessary mapping request should nvme_dev_add() request such an IRQ
mapping even though no read queues are being allocated. However,
nvme_dev_add() can avoid making the request by checking the number of
read queues without assuming. This would bring it more in line with
nvme_setup_irqs() and nvme_calc_irq_sets().
Signed-off-by: NAlan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0298d543

nvme-pci: don't fall back to a 32-bit DMA mask · 4fe06923

由 Christoph Hellwig 提交于 6月 28, 2019

Since Linux 5.0 drivers can safely set the largest DMA mask supported
by the device, and don't need fallbacks to work around the dma mapping
implementations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

4fe06923

nvme-pci: make nvme_dev_pm_ops static · 21774222

由 YueHaibing 提交于 6月 26, 2019

Fix sparse warning:

drivers/nvme/host/pci.c:2926:25: warning:
 symbol 'nvme_dev_pm_ops' was not declared. Should it be static?
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

21774222

nvme-fcloop: resolve warnings on RCU usage and sleep warnings · e0620bf8

由 James Smart 提交于 6月 20, 2019

With additional debugging enabled, seeing warnings for suspicious RCU
usage or Sleeping function called from invalid context.

These both map to allocation of a work structure which is currently
GFP_KERNEL, meaning it can sleep. For the RCU warning, the sequence was
sleeping while holding the RCU lock.

Convert the allocation to GFP_ATOMIC.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e0620bf8

nvme-fcloop: fix inconsistent lock state warnings · c38dbbfa

由 James Smart 提交于 6月 20, 2019

With extra debug on, inconsistent lock state warnings are being called
out as the tfcp_req->reqlock is being taken out without irq, while some
calling sequences have the sequence in a softirq state.

Change the lock taking/release to raise/drop irq.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c38dbbfa

06 7月, 2019 1 次提交

blk-iolatency: fix STS_AGAIN handling · c9b3007f

由 Dennis Zhou 提交于 7月 05, 2019

The iolatency controller is based on rq_qos. It increments on
rq_qos_throttle() and decrements on either rq_qos_cleanup() or
rq_qos_done_bio(). a3fb01ba fixes the double accounting issue where
blk_mq_make_request() may call both rq_qos_cleanup() and
rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
double decrement.

The above works upstream as the only way we can get STS_AGAIN is from
blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
problem as bio_endio() skipping only happens on reserved tag allocation
failures which can only be caused by driver bugs and already triggers
WARN.

However, the fix creates a not so great dependency on how STS_AGAIN can
be propagated. Internally, we (Facebook) carry a patch that kills read
ahead if a cgroup is io congested or a fatal signal is pending. This
combined with chained bios progagate their bi_status to the parent is
not already set can can cause the parent bio to not clean up properly
even though it was successful. This consequently leaks the inflight
counter and can hang all IOs under that blkg.

To nip the adverse interaction early, this removes the rq_qos_cleanup()
callback in iolatency in favor of cleaning up always on the
rq_qos_done_bio() path.

Fixes: a3fb01ba ("blk-iolatency: only account submitted bios")
Debugged-by: NTejun Heo <tj@kernel.org>
Debugged-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c9b3007f

03 7月, 2019 3 次提交

block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES · d665e12a

由 Christoph Hellwig 提交于 7月 03, 2019

Fix a regression introduced when removing bi_phys_segments for Write Zeroes
requests, which need to have a segment count of zero, as they don't have a
payload.

Fixes: 14ccb66b ("block: remove the bi_phys_segments field in struct bio")
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d665e12a

blk-mq: simplify blk_mq_make_request() · 970d168d

由 Bart Van Assche 提交于 7月 01, 2019

Move the blk_mq_bio_to_request() call in front of the if-statement.

Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

970d168d

blk-mq: remove blk_mq_put_ctx() · c05f4220

由 Bart Van Assche 提交于 7月 01, 2019

No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
on preemption being disabled for its correctness. Since removing the CPU
preemption calls does not measurably affect performance, simplify the
blk-mq code by removing the blk_mq_put_ctx() function and also by not
disabling preemption in blk_mq_get_ctx().

Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c05f4220

02 7月, 2019 1 次提交

sbitmap: Replace cmpxchg with xchg · 41723288

由 Pavel Begunkov 提交于 5月 23, 2019

cmpxchg() with an immediate value could be replaced with less expensive
xchg(). The same true if new value don't _depend_ on the old one.

In the second block, atomic_cmpxchg() return value isn't checked, so
after atomic_cmpxchg() -> atomic_xchg() conversion it could be replaced
with atomic_set(). Comparison with atomic_read() in the second chunk was
left as an optimisation (if that was the initial intention).
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41723288

01 7月, 2019 2 次提交

block: fix .bi_size overflow · 79d08f89

由 Ming Lei 提交于 7月 01, 2019

'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
bytes.

Before 07173c3e ("block: enable multipage bvecs"), one bio can
include very limited pages, and usually at most 256, so the fs bio
size won't be bigger than 1M bytes most of times.

Since we support multi-page bvec, in theory one fs bio really can
be added > 1M pages, especially in case of hugepage, or big writeback
with too many dirty pages. Then there is chance in which .bi_size
is overflowed.

Fixes this issue by using bio_full() to check if the added segment may
overflow .bi_size.

Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 07173c3e ("block: enable multipage bvecs")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

79d08f89

Merge tag 'v5.2-rc6' into for-5.3/block · 5be1f9d8

由 Jens Axboe 提交于 7月 01, 2019

Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
fix. Otherwise we end up having conflicts with future patches between
for-5.3/block and master that touch this area. In particular, it makes
the bio_full() fix hard to backport to stable.

* tag 'v5.2-rc6': (482 commits)
  Linux 5.2-rc6
  Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
  Bluetooth: Fix regression with minimum encryption key size alignment
  tcp: refine memory limit test in tcp_fragment()
  x86/vdso: Prevent segfaults due to hoisted vclock reads
  SUNRPC: Fix a credential refcount leak
  Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
  net :sunrpc :clnt :Fix xps refcount imbalance on the error path
  NFS4: Only set creation opendata if O_CREAT
  ARM: 8867/1: vdso: pass --be8 to linker if necessary
  KVM: nVMX: reorganize initial steps of vmx_set_nested_state
  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
  habanalabs: use u64_to_user_ptr() for reading user pointers
  nfsd: replace Jeff by Chuck as nfsd co-maintainer
  inet: clear num_timeout reqsk_alloc()
  PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
  net: mvpp2: debugfs: Add pmap to fs dump
  ipv6: Default fib6_type to RTN_UNICAST when not set
  net: hns3: Fix inconsistent indenting
  net/af_iucv: always register net_device notifier
  ...

5be1f9d8

30 6月, 2019 3 次提交

block: sed-opal: check size of shadow mbr · ff91064e

由 Jonas Rabenstein 提交于 5月 21, 2019

Check whether the shadow mbr does fit in the provided space on the
target. Also a proper firmware should handle this case and return an
error we may prevent problems or even damage with crappy firmwares.
Signed-off-by: NJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: NDavid Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: NScott Bauer <sbauer@plzdonthack.me>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff91064e

block: sed-opal: ioctl for writing to shadow mbr · a9b25b4c

由 Jonas Rabenstein 提交于 5月 21, 2019

Allow modification of the shadow mbr. If the shadow mbr is not marked as
done, this data will be presented read only as the device content. Only
after marking the shadow mbr as done and unlocking a locking range the
actual content is accessible.
Co-authored-by: NDavid Kozub <zub@linux.fjfi.cvut.cz>
Signed-off-by: NJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: NDavid Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: NScott Bauer <sbauer@plzdonthack.me>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a9b25b4c

block: sed-opal: add ioctl for done-mark of shadow mbr · c9888443

由 Jonas Rabenstein 提交于 5月 21, 2019

Enable users to mark the shadow mbr as done without completely
deactivating the shadow mbr feature. This may be useful on reboots,
when the power to the disk is not disconnected in between and the shadow
mbr stores the required boot files. Of course, this saves also the
(few) commands required to enable the feature if it is already enabled
and one only wants to mark the shadow mbr as done.
Co-authored-by: NDavid Kozub <zub@linux.fjfi.cvut.cz>
Signed-off-by: NJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Signed-off-by: NDavid Kozub <zub@linux.fjfi.cvut.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c9888443

29 6月, 2019 18 次提交

block: never take page references for ITER_BVEC · b6207430

由 Christoph Hellwig 提交于 6月 26, 2019

If we pass pages through an iov_iter we always already have a reference
in the caller.  Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6207430

direct-io: use bio_release_pages in dio_bio_complete · d7c8aa85

由 Christoph Hellwig 提交于 6月 26, 2019

Use bio_release_pages instead of duplicating it.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d7c8aa85

block_dev: use bio_release_pages in bio_unmap_user · 9fec4a21

由 Christoph Hellwig 提交于 6月 26, 2019

Use bio_release_pages instead of duplicating it.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9fec4a21

block_dev: use bio_release_pages in blkdev_bio_end_io · 57dfe3ce

由 Christoph Hellwig 提交于 6月 26, 2019

Use bio_release_pages instead of duplicating it.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

57dfe3ce

iomap: use bio_release_pages in iomap_dio_bio_end_io · 147a6053

由 Christoph Hellwig 提交于 6月 26, 2019

Use bio_release_pages instead of duplicating it.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

147a6053

block: use bio_release_pages in bio_map_user_iov · 506e0798

由 Christoph Hellwig 提交于 6月 26, 2019

Use bio_release_pages instead of open coding it.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

506e0798

block: use bio_release_pages in bio_unmap_user · 163cc2d3

由 Christoph Hellwig 提交于 6月 26, 2019

Use bio_release_pages instead of open coding it.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

163cc2d3

block: optionally mark pages dirty in bio_release_pages · d241a95f

由 Christoph Hellwig 提交于 6月 26, 2019

A lot of callers of bio_release_pages also want to mark the released
pages as dirty.  Add a mark_dirty parameter to avoid a second
relatively expensive bio_for_each_segment_all loop.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d241a95f

block: move the BIO_NO_PAGE_REF check into bio_release_pages · b2d0d991

由 Christoph Hellwig 提交于 6月 26, 2019

Move the BIO_NO_PAGE_REF check into bio_release_pages instead of
duplicating it in both callers.

Also make the function available outside of bio.c so that we can
reuse it in other direct I/O implementations.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2d0d991

block: skd_main.c: Remove call to memset after dma_alloc_coherent · 5f2ab0c1

由 Fuqian Huang 提交于 6月 28, 2019

In commit af7ddd8a
("Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping"),
dma_alloc_coherent has already zeroed the memory.
So memset is not needed.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f2ab0c1

block: mtip32xx: Remove call to memset after dma_alloc_coherent · b71e8c13

由 Fuqian Huang 提交于 6月 28, 2019

In commit af7ddd8a
("Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping"),
dma_alloc_coherent has already zeroed the memory.
So memset is not needed.
Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b71e8c13

block: sed-opal: "Never True" conditions · 15ddffcb

由 Revanth Rajashekar 提交于 6月 27, 2019

'who' an unsigned variable in stucture opal_session_info
can never be lesser than zero. Hence, the condition
"who < OPAL_ADMIN1" can never be true.
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

15ddffcb

block: sed-opal: PSID reverttper capability · 5e4c7cf6

由 Revanth Rajashekar 提交于 6月 27, 2019

PSID is a 32 character password printed on the drive label,
to prove its physical access. This PSID reverttper function
is very useful to regain the control over the drive when it
is locked and the user can no longer access it because of some
failures. However, *all the data on the drive is completely
erased*. This method is advisable only when the user is exhausted
of all other recovery methods.

PSID capabilities are described in:
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_PSID_v1.00_r1.00.pdfSigned-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e4c7cf6

block, documentation: Document discard_zeroes_data, fua, max_discard_segments... · fbbe7c86

由 Bart Van Assche 提交于 6月 28, 2019

block, documentation: Document discard_zeroes_data, fua, max_discard_segments and write_zeroes_max_bytes
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fbbe7c86

block, documentation: Explain the word 'segments' · 0c766e78

由 Bart Van Assche 提交于 6月 28, 2019

Several block layer users who are not kernel developers do not know that
the word 'segment' refers to an element in a DMA scatter/gather list. Make
the block layer documentation easier to understand by stating explicitly
what the word 'segment' stands for.
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0c766e78

block, documentation: Sort queue sysfs attribute names alphabetically · 6728ac33

由 Bart Van Assche 提交于 6月 28, 2019

Commit f9824952 ("block: update sysfs documentation") # v5.0 broke the
alphabetical order of the sysfs attribute names. List queue sysfs attribute
names alphabetically.

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6728ac33

block, documentation: Fix wbt_lat_usec documentation · 152c7776

由 Bart Van Assche 提交于 6月 28, 2019

Fix the spelling of the wbt_lat_usec sysfs attribute.

Fixes: 87760e5e ("block: hook up writeback throttling") # v4.10.
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

152c7776

null_blk: fix type mismatch null_handle_cmd() · 152c762e

由 Chaitanya Kulkarni 提交于 6月 28, 2019

In null_handle_cmd() when device is configured as zoned, variable op is
decalred as an int, where it is used to hold values of type
REQ_OP_XXX which is of type enum req_opf. Change the type from
int to enum req_opf.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

152c762e

28 6月, 2019 2 次提交

block, bfq: NULL out the bic when it's no longer valid · dbc3117d

由 Douglas Anderson 提交于 6月 27, 2019

In reboot tests on several devices we were seeing a "use after free"
when slub_debug or KASAN was enabled.  The kernel complained about:

  Unable to handle kernel paging request at virtual address 6b6b6c2b

...which is a classic sign of use after free under slub_debug.  The
stack crawl in kgdb looked like:

 0  test_bit (addr=<optimized out>, nr=<optimized out>)
 1  bfq_bfqq_busy (bfqq=<optimized out>)
 2  bfq_select_queue (bfqd=<optimized out>)
 3  __bfq_dispatch_request (hctx=<optimized out>)
 4  bfq_dispatch_request (hctx=<optimized out>)
 5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
 6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
 7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
 8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
 9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
 10 0xc024cff4 in worker_thread (__worker=0xec6d4640)

Digging in kgdb, it could be found that, though bfqq looked fine,
bfqq->bic had been freed.

Through further digging, I postulated that perhaps it is illegal to
access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
because the "bic" can be freed at some point in time after this call
is made.  I confirmed that there certainly were cases where the exact
crashing code path would access the "bic" after bfq_exit_icq() had
been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
saw that the bic was 0x7 at the time of the crash.

To understand a bit more about why this crash was fairly uncommon (I
saw it only once in a few hundred reboots), you can see that much of
the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
access the ->bic anymore.  The only case it doesn't is if
bfq_put_queue() sees a reference still held.

However, even in the case when bfqq isn't freed, the crash is still
rare.  Why?  I tracked what happened to the "bic" after the exit
routine.  It doesn't get freed right away.  Rather,
put_io_context_active() eventually called put_io_context() which
queued up freeing on a workqueue.  The freeing then actually happened
later than that through call_rcu().  Despite all these delays, some
extra debugging showed that all the hoops could be jumped through in
time and the memory could be freed causing the original crash.  Phew!

To make a long story short, assuming it truly is illegal to access an
icq after the "exit_icq" callback is finished, this patch is needed.

Cc: stable@vger.kernel.org
Reviewed-by: NPaolo Valente <paolo.valente@unimore.it>
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbc3117d

bcache: add reclaimed_journal_buckets to struct cache_set · dff90d58

由 Coly Li 提交于 6月 28, 2019

Now we have counters for how many times jouranl is reclaimed, how many
times cached dirty btree nodes are flushed, but we don't know how many
jouranl buckets are really reclaimed.

This patch adds reclaimed_journal_buckets into struct cache_set, this
is an increasing only counter, to tell how many journal buckets are
reclaimed since cache set runs. From all these three counters (reclaim,
reclaimed_journal_buckets, flush_write), we can have idea how well
current journal space reclaim code works.
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dff90d58

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功