提交 · 9c5d03d362519f36cd551aec596388f895c93d2d · openeuler / Kernel

29 8月, 2022 1 次提交

genetlink: start to validate reserved header bytes · 9c5d03d3

由 Jakub Kicinski 提交于 8月 24, 2022

We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c5d03d3

20 5月, 2022 1 次提交

scsi: target: tcmu: Avoid holding XArray lock when calling lock_page · 325d5c5f

由 Bodo Stroesser 提交于 5月 17, 2022

In tcmu_blocks_release(), lock_page() is called to prevent a race causing
possible data corruption. Since lock_page() might sleep, calling it while
holding XArray lock is a bug.

To fix this, replace the xas_for_each() call with xa_for_each_range().
Since the latter does its own handling of XArray locking, the xas_lock()
and xas_unlock() calls around the original loop are no longer necessary.

The switch to xa_for_each_range() slows down the loop slightly. This is
acceptable since tcmu_blocks_release() is not relevant for performance.

Link: https://lore.kernel.org/r/20220517192913.21405-1-bostroesser@gmail.com
Fixes: bb9b9eb0 ("scsi: target: tcmu: Fix possible data corruption")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

325d5c5f

03 5月, 2022 1 次提交

scsi: target: tcmu: Fix possible data corruption · bb9b9eb0

由 Xiaoguang Wang 提交于 4月 21, 2022

When tcmu_vma_fault() gets a page successfully, before the current context
completes page fault procedure, find_free_blocks() may run and call
unmap_mapping_range() to unmap the page. Assume that when
find_free_blocks() initially completes and the previous page fault
procedure starts to run again and completes, then one truncated page has
been mapped to userspace. But note that tcmu_vma_fault() has gotten a
refcount for the page so any other subsystem won't be able to use the page
unless the userspace address is unmapped later.

If another command subsequently runs and needs to extend dbi_thresh it may
reuse the corresponding slot for the previous page in data_bitmap. Then
though we'll allocate new page for this slot in data_area, no page fault
will happen because we have a valid map and the real request's data will be
lost.

Filesystem implementations will also run into this issue but they usually
lock the page when vm_operations_struct->fault gets a page and unlock the
page after finish_fault() completes. For truncate filesystems lock pages in
truncate_inode_pages() to protect against racing wrt. page faults.

To fix this possible data corruption scenario we can apply a method similar
to the filesystems. For pages that are to be freed, tcmu_blocks_release()
locks and unlocks. Make tcmu_vma_fault() also lock found page under
cmdr_lock. At the same time, since tcmu_vma_fault() gets an extra page
refcount, tcmu_blocks_release() won't free pages if pages are in page fault
procedure, which means it is safe to call tcmu_blocks_release() before
unmap_mapping_range().

With these changes tcmu_blocks_release() will wait for all page faults to
be completed before calling unmap_mapping_range(). And later, if
unmap_mapping_range() is called, it will ensure stale mappings are removed.

Link: https://lore.kernel.org/r/20220421023735.9018-1-xiaoguang.wang@linux.alibaba.comReviewed-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

bb9b9eb0

30 3月, 2022 1 次提交

scsi: target: tcmu: Fix possible page UAF · a6968f7a

由 Xiaoguang Wang 提交于 3月 11, 2022

tcmu_try_get_data_page() looks up pages under cmdr_lock, but it does not
take refcount properly and just returns page pointer. When
tcmu_try_get_data_page() returns, the returned page may have been freed by
tcmu_blocks_release().

We need to get_page() under cmdr_lock to avoid concurrent
tcmu_blocks_release().

Link: https://lore.kernel.org/r/20220311132206.24515-1-xiaoguang.wang@linux.alibaba.comReviewed-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a6968f7a

23 2月, 2022 1 次提交

scsi: target: tcmu: Make cmd_ring_size changeable via configfs · c7ede4f0

由 Guixin Liu 提交于 2月 16, 2022

Make cmd_ring_size changeable similar to the way it is done for
max_data_area_mb. The reason is that our tcmu client will create thousands
of tcmu instances, and this will consume lots of mem with default 8Mb cmd
ring size for every backstore.

One can change the value by typing:

    echo "cmd_ring_size_mb=N" > control

The "N" is a integer between 1 to 8, if set 1, the cmd ring can hold about
6k cmds(tcmu_cmd_entry about 176 byte) at least.

The value is printed when doing:

    cat info

In addition, a new readonly attribute 'cmd_ring_size_mb' returns the value
in read.

Link: https://lore.kernel.org/r/1644978109-14885-1-git-send-email-kanie@linux.alibaba.comReviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NGuixin Liu <kanie@linux.alibaba.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c7ede4f0

19 10月, 2021 1 次提交

scsi: target: tcmu: Allocate zeroed pages for data area · 1d2ac7b6

由 Bodo Stroesser 提交于 10月 13, 2021

Tcmu populates the data area (used for communication with userspace) with
pages that are allocated by calling alloc_page(GFP_NOIO). Therefore
previous content of the allocated pages is exposed to user space. Avoid
this by adding __GFP_ZERO flag.

Zeroing the pages does (nearly) not affect tcmu throughput, because
allocated pages are re-used for the data transfers of later SCSI cmds.

Link: https://lore.kernel.org/r/20211013171606.25197-1-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1d2ac7b6

05 10月, 2021 1 次提交

scsi: target: tcmu: Use struct_size() helper in kmalloc() · c20bda34

由 Gustavo A. R. Silva 提交于 9月 27, 2021

Make use of the struct_size() helper instead of an open-coded version, in
order to avoid any potential type mistakes or integer overflows that, in
the worst scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160
Link: https://lore.kernel.org/r/20210927224344.GA190701@embeddedorReviewed-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c20bda34

03 8月, 2021 1 次提交

scsi: target: tcmu: Add new feature KEEP_BUF · 018c1491

由 Bodo Stroesser 提交于 7月 13, 2021

When running command pipelining for WRITE direction commands (e.g. tape
device write), userspace sends cmd completion to cmd ring before processing
write data. In that case userspace has to copy data before sending
completion, because cmd completion also implicitly releases the data buffer
in data area.

The new feature KEEP_BUF allows userspace to optionally keep the buffer
after completion by setting new bit TCMU_UFLAG_KEEP_BUF in
tcmu_cmd_entry_hdr->uflags. In that case buffer has to be released
explicitly by writing the cmd_id to new action item free_kept_buf.

All kept buffers are released during reset_ring and if userspace closes uio
device (tcmu_release).

Link: https://lore.kernel.org/r/20210713175021.20103-1-bostroesser@gmail.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

018c1491

22 5月, 2021 2 次提交

scsi: target: tcmu: Fix boolreturn.cocci warnings · 82473125

由 kernel test robot 提交于 5月 16, 2021

drivers/target/target_core_user.c:1424:9-10: WARNING: return of 0/1 in function 'tcmu_handle_completions' with return type bool

Return statements in functions returning bool should use
true/false instead of 1/0.

Generated by: scripts/coccinelle/misc/boolreturn.cocci

Link: https://lore.kernel.org/r/20210515230358.GA97544@60d1edce16e0
Fixes: 9814b55c ("scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found")
CC: Bodo Stroesser <bostroesser@gmail.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Acked-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

82473125

scsi: target: tcmu: Fix xarray RCU warning · b4150b68

由 Bodo Stroesser 提交于 5月 19, 2021

Commit f5ce815f ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N *
PAGE_SIZE") introduced xas_next() calls to iterate xarray elements.  These
calls triggered the WARNING "suspicious RCU usage" at tcmu device set up
[1]. In the call stack of xas_next(), xas_load() was called.  According to
its comment, this function requires "the xa_lock or the RCU lock".

To avoid the warning:

 - Guard the small loop calling xas_next() in tcmu_get_empty_block with RCU
   lock.

 - In the large loop in tcmu_copy_data using RCU lock would possibly
   disable preemtion for a long time (copy multi MBs). Therefore replace
   XA_STATE, xas_set and xas_next with a single xa_load.

[1]

[ 1899.867091] =============================
[ 1899.871199] WARNING: suspicious RCU usage
[ 1899.875310] 5.13.0-rc1+ #41 Not tainted
[ 1899.879222] -----------------------------
[ 1899.883299] include/linux/xarray.h:1182 suspicious rcu_dereference_check() usage!
[ 1899.890940] other info that might help us debug this:
[ 1899.899082] rcu_scheduler_active = 2, debug_locks = 1
[ 1899.905719] 3 locks held by kworker/0:1/1368:
[ 1899.910161]  #0: ffffa1f8c8b98738 ((wq_completion)target_submission){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580
[ 1899.920732]  #1: ffffbd7040cd7e78 ((work_completion)(&q->sq.work)){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580
[ 1899.931146]  #2: ffffa1f8d1c99768 (&udev->cmdr_lock){+.+.}-{3:3}, at: tcmu_queue_cmd+0xea/0x160 [target_core_user]
[ 1899.941678] stack backtrace:
[ 1899.946093] CPU: 0 PID: 1368 Comm: kworker/0:1 Not tainted 5.13.0-rc1+ #41
[ 1899.953070] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 1302 03/15/2018
[ 1899.962459] Workqueue: target_submission target_queued_submit_work [target_core_mod]
[ 1899.970337] Call Trace:
[ 1899.972839]  dump_stack+0x6d/0x89
[ 1899.976222]  xas_descend+0x10e/0x120
[ 1899.979875]  xas_load+0x39/0x50
[ 1899.983077]  tcmu_get_empty_blocks+0x115/0x1c0 [target_core_user]
[ 1899.989318]  queue_cmd_ring+0x1da/0x630 [target_core_user]
[ 1899.994897]  ? rcu_read_lock_sched_held+0x3f/0x70
[ 1899.999695]  ? trace_kmalloc+0xa6/0xd0
[ 1900.003501]  ? __kmalloc+0x205/0x380
[ 1900.007167]  tcmu_queue_cmd+0x12f/0x160 [target_core_user]
[ 1900.012746]  __target_execute_cmd+0x23/0xa0 [target_core_mod]
[ 1900.018589]  transport_generic_new_cmd+0x1f3/0x370 [target_core_mod]
[ 1900.025046]  transport_handle_cdb_direct+0x34/0x50 [target_core_mod]
[ 1900.031517]  target_queued_submit_work+0x43/0xe0 [target_core_mod]
[ 1900.037837]  process_one_work+0x268/0x580
[ 1900.041952]  ? process_one_work+0x580/0x580
[ 1900.046195]  worker_thread+0x55/0x3b0
[ 1900.049921]  ? process_one_work+0x580/0x580
[ 1900.054192]  kthread+0x143/0x160
[ 1900.057499]  ? kthread_create_worker_on_cpu+0x40/0x40
[ 1900.062661]  ret_from_fork+0x1f/0x30

Link: https://lore.kernel.org/r/20210519135440.26773-1-bostroesser@gmail.com
Fixes: f5ce815f ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE")
Reported-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b4150b68

15 5月, 2021 1 次提交

scsi: target: tcmu: Rename TCM_DEV_BIT_PLUGGED to TCMU_DEV_BIT_PLUGGED · 3ac0fcb4

由 Bodo Stroesser 提交于 5月 12, 2021

The bit definition TCM_DEV_BIT_PLUGGED should correctly be named
TCMU_DEV_BIT_PLUGGED, since all other bits in the same bitfield have prefix
TCMU_.

Link: https://lore.kernel.org/r/20210512140654.31249-1-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3ac0fcb4

29 4月, 2021 1 次提交

scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found · 9814b55c

由 Bodo Stroesser 提交于 4月 23, 2021

If tcmu_handle_completions() finds an invalid cmd_id while looping over cmd
responses from userspace it sets TCMU_DEV_BIT_BROKEN and breaks the
loop. This means that it does further handling for the tcmu device.

Skip that handling by replacing 'break' with 'return'.

Additionally change tcmu_handle_completions() from unsigned int to bool,
since the value used in return already is bool.

Link: https://lore.kernel.org/r/20210423150123.24468-1-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9814b55c

13 4月, 2021 6 次提交

scsi: target: tcmu: Make data_pages_per_blk changeable via configfs · 08976cb5

由 Bodo Stroesser 提交于 3月 24, 2021

Make data_pages_per_blk changeable similar to the way it is done for
max_data_area_mb. One can change the value by typing:

  echo "data_pages_per_blk=N" >control

The value is printed when doing:

  cat info

In addition, a new readonly attribute 'data_pages_per_blk' returns the
value on read.

Link: https://lore.kernel.org/r/20210324195758.2021-7-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

08976cb5

scsi: target: tcmu: Replace block size definitions with new udev members · e719afdc

由 Bodo Stroesser 提交于 3月 24, 2021

Replace DATA_PAGES_PER_BLK and DATA_BLOCK_SIZE with new struct elements
tcmu_dev->data_pages_per_blk and tcmu_dev->data_blk_size.  These new
variables are still loaded with constant definition DATA_PAGES_PER_BLK_DEF
(= 1) and DATA_PAGES_PER_BLK_DEF * PAGE_SIZE.

There is no way yet to set the values via configfs.

Link: https://lore.kernel.org/r/20210324195758.2021-6-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e719afdc

scsi: target: tcmu: Remove function tcmu_get_block_page() · 3722e36c

由 Bodo Stroesser 提交于 3月 24, 2021

There is only one caller of tcmu_get_block_page left. Since it is a
one-liner, we can remove the function.

Link: https://lore.kernel.org/r/20210324195758.2021-5-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3722e36c

scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE · f5ce815f

由 Bodo Stroesser 提交于 3月 24, 2021

Change tcmu to support DATA_BLOCK_SIZE being a multiple of PAGE_SIZE. There
are two reasons why one would like to have a bigger DATA_BLOCK_SIZE:

1) If userspace - e.g. due to data compression, encryption or
deduplication - needs to have receive or transmit data in a consecutive
buffer, we can define DATA_BLOCK_SIZE to the maximum size of a SCSI
READ/WRITE to enforce that userspace sees just one consecutive
buffer. That way we can avoid the need for doing data copy in
userspace.

2) Using a bigger data block size can speed up command processing in
tcmu. The number of free data blocks to look up in bitmap is reduced
substantially. The lookup for data pages in radix_tree can be done more
efficiently if there are multiple pages in a data block. The maximum
number of IOVs to set up is lower so cmd entries in the ring become
smaller.

Link: https://lore.kernel.org/r/20210324195758.2021-4-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f5ce815f

scsi: target: tcmu: Prepare for PAGE_SIZE != DATA_BLOCK_SIZE · 8b084d9d

由 Bodo Stroesser 提交于 3月 24, 2021

Rename some variables and definitions as a first preparation for
DATA_BLOCK_SIZE != PAGE_SIZE and add the new DATA_PAGES_PER_BLK definition
containing the number of pages per data block.

Rename tcmu_try_get_block_page() to tcmu_try_get_data_page(). Keep name
tcmu_get_block_page() since it will go away in a following commit when
there is only one caller left. Subsequent commits will then add full
support for DATA_PAGES_PER_BLK != 1, which also means DATA_BLOCK_SIZE =
DATA_PAGES_PER_BLK * PAGE_SIZE

Link: https://lore.kernel.org/r/20210324195758.2021-3-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8b084d9d

scsi: target: tcmu: Adjust names of variables and definitions · ecddbb7e

由 Bodo Stroesser 提交于 3月 24, 2021

Some definitions and members of struct tcmu_dev had misleading
names. Examples:

 - ring_size was used for the size of mailbox + cmd ring + data area

 - CMDR_SIZE was used for size of mailbox + cmd ring

I added the new definition MB_CMDR_SIZE (mailbox + command ring), changed
CMDR_SIZE to hold the size of the command ring only and replaced in struct
tcmu_dev the member ring_size with mmap_pages, because the member is now
used in tcmu_mmap() only, where we need page count, not size.

I also added the new struct tcmu_dev member 'cmdr' which is used to replace
some occurences of '(void *)mb + CMDR_OFF' with 'udev->cmdr' for better
readability.

Link: https://lore.kernel.org/r/20210324195758.2021-2-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ecddbb7e

16 3月, 2021 1 次提交

scsi: target: tcmu: Adjust parameter in call to tcmu_blocks_release() · 471ee95c

由 Bodo Stroesser 提交于 3月 10, 2021

In commit f7c89771 ("scsi: target: tcmu: Replace radix_tree with
XArray") the meaning of last parameter of tcmu_blocks_release() was
changed. So in the callers we should subtract 1 from the previous
parameter.

Unfortunately that change got lost at one of the two places where
tcmu_blocks_release() is called. That does not lead to any problems, but we
should adjust it anyway.

Link: https://lore.kernel.org/r/20210310184458.10741-1-bostroesser@gmail.comSigned-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

471ee95c

10 3月, 2021 3 次提交

scsi: target: tcmu: Use GFP_NOIO while handling cmds or holding cmdr_lock · 1080782f

由 Bodo Stroesser 提交于 3月 05, 2021

Especially when using tcmu with tcm_loop, memory allocations with
GFP_KERNEL for a LUN can cause write back to the same LUN.

So we have to use GFP_NOIO when allocation is done while handling commands
or while holding cmdr_lock.

Link: https://lore.kernel.org/r/20210305190009.32242-1-bostroesser@gmail.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1080782f

scsi: target: tcmu: Replace radix_tree with XArray · f7c89771

由 Bodo Stroesser 提交于 2月 24, 2021

An attempt from Matthew Wilcox to replace radix-tree usage by XArray in
tcmu more than 1 year ago unfortunately got lost.

I rebased that work on latest tcmu and tested it.

Link: https://lore.kernel.org/r/20210224185335.13844-3-bostroesser@gmail.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f7c89771

scsi: target: tcmu: Replace IDR by XArray · d3cbb743

由 Bodo Stroesser 提交于 2月 24, 2021

An attempt from Matthew Wilcox to replace IDR usage by XArray in tcmu more
than 1 year ago unfortunately got lost.

I rebased that work on latest tcmu and tested it.

Link: https://lore.kernel.org/r/20210224185335.13844-2-bostroesser@gmail.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d3cbb743

05 3月, 2021 1 次提交

scsi: target: tcmu: Add backend plug/unplug callouts · 6888da81

由 Mike Christie 提交于 2月 27, 2021

This patch adds plug/unplug callouts for tcmu, so we can avoid the number
of times we switch to userspace. Using this driver with tcm_loop is a
common config, and dependng on the nr_hw_queues (nr_hw_queues=1 performs
much better) and fio jobs (lower num jobs around 4) this patch can increase
IOPS by only around 5-10% because we hit other issues like the big per tcmu
device mutex.

Link: https://lore.kernel.org/r/20210227170006.5077-24-michael.christie@oracle.comReviewed-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6888da81

23 2月, 2021 2 次提交

scsi: target: tcmu: Fix memory leak caused by wrong uio usage · 8f33bb24

由 Bodo Stroesser 提交于 2月 18, 2021

When user deletes a tcmu device via configFS, tcmu calls
uio_unregister_device(). During that call uio resets its pointer to struct
uio_info provided by tcmu. That means, after uio_unregister_device() uio
will no longer execute any of the callbacks tcmu had set in uio_info.

Especially, if userspace daemon still holds the corresponding uio device
open or mmap'ed while tcmu calls uio_unregister_device(), uio will not call
tcmu_release() when userspace finally closes and munmaps the uio device.

Since tcmu does refcounting for the tcmu device in tcmu_open() and
tcmu_release(), in the decribed case refcount does not drop to 0 and tcmu
does not free tcmu device's resources. In extreme cases this can cause
memory leaking of up to 1 GB for a single tcmu device.

After uio_unregister_device(), uio will reject every open, read, write,
mmap from userspace with -EOI. But userspace daemon can still access the
mmap'ed command ring and data area. Therefore tcmu should wait until
userspace munmaps the uio device before it frees the resources, as we don't
want to cause SIGSEGV or SIGBUS to user space.

That said, current refcounting during tcmu_open and tcmu_release does not
work correctly, and refcounting better should be done in the open and close
callouts of the vm_operations_struct, which tcmu assigns to each mmap of
the uio device (because it wants its own page fault handler).

This patch fixes the memory leak by removing refcounting from tcmu_open and
tcmu_close, and instead adding new tcmu_vma_open() and tcmu_vma_close()
handlers that only do refcounting.

Link: https://lore.kernel.org/r/20210218175039.7829-3-bostroesser@gmail.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8f33bb24

scsi: target: tcmu: Move some functions without code change · 43bf922c

由 Bodo Stroesser 提交于 2月 18, 2021

This patch just moves one block of code containing some functions inside
target_core_user.c to avoid adding prototypes in next patch.

Link: https://lore.kernel.org/r/20210218175039.7829-2-bostroesser@gmail.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

43bf922c

15 1月, 2021 1 次提交

scsi: target: tcmu: Fix use-after-free of se_cmd->priv · 780e1384

由 Shin'ichiro Kawasaki 提交于 1月 13, 2021

Commit a3512902 ("scsi: target: tcmu: Use priv pointer in se_cmd")
modified tcmu_free_cmd() to set NULL to priv pointer in se_cmd. However,
se_cmd can be already freed by work queue triggered in
target_complete_cmd(). This caused BUG KASAN use-after-free [1].

To fix the bug, do not touch priv pointer in tcmu_free_cmd(). Instead, set
NULL to priv pointer before target_complete_cmd() calls. Also, to avoid
unnecessary priv pointer change in tcmu_queue_cmd(), modify priv pointer in
the function only when tcmu_free_cmd() is not called.

[1]
BUG: KASAN: use-after-free in tcmu_handle_completions+0x1172/0x1770 [target_core_user]
Write of size 8 at addr ffff88814cf79a40 by task cmdproc-uio0/14842

CPU: 2 PID: 14842 Comm: cmdproc-uio0 Not tainted 5.11.0-rc2 #1
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019
Call Trace:
 dump_stack+0x9a/0xcc
 ? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
 print_address_description.constprop.0+0x18/0x130
 ? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
 ? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
 kasan_report.cold+0x7f/0x10e
 ? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
 tcmu_handle_completions+0x1172/0x1770 [target_core_user]
 ? queue_tmr_ring+0x5d0/0x5d0 [target_core_user]
 tcmu_irqcontrol+0x28/0x60 [target_core_user]
 uio_write+0x155/0x230
 ? uio_vma_fault+0x460/0x460
 ? security_file_permission+0x4f/0x440
 vfs_write+0x1ce/0x860
 ksys_write+0xe9/0x1b0
 ? __ia32_sys_read+0xb0/0xb0
 ? syscall_enter_from_user_mode+0x27/0x70
 ? trace_hardirqs_on+0x1c/0x110
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fcf8b61905f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b9 fc ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c fd ff ff 48
RSP: 002b:00007fcf7b3e6c30 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcf8b61905f
RDX: 0000000000000004 RSI: 00007fcf7b3e6c78 RDI: 000000000000000c
RBP: 00007fcf7b3e6c80 R08: 0000000000000000 R09: 00007fcf7b3e6aa8
R10: 000000000b01c000 R11: 0000000000000293 R12: 00007ffe0c32a52e
R13: 00007ffe0c32a52f R14: 0000000000000000 R15: 00007fcf7b3e7640

Allocated by task 383:
 kasan_save_stack+0x1b/0x40
 ____kasan_kmalloc.constprop.0+0x84/0xa0
 kmem_cache_alloc+0x142/0x330
 tcm_loop_queuecommand+0x2a/0x4e0 [tcm_loop]
 scsi_queue_rq+0x12ec/0x2d20
 blk_mq_dispatch_rq_list+0x30a/0x1db0
 __blk_mq_do_dispatch_sched+0x326/0x830
 __blk_mq_sched_dispatch_requests+0x2c8/0x3f0
 blk_mq_sched_dispatch_requests+0xca/0x120
 __blk_mq_run_hw_queue+0x93/0xe0
 process_one_work+0x7b6/0x1290
 worker_thread+0x590/0xf80
 kthread+0x362/0x430
 ret_from_fork+0x22/0x30

Freed by task 11655:
 kasan_save_stack+0x1b/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x20/0x30
 ____kasan_slab_free+0xec/0x120
 slab_free_freelist_hook+0x53/0x160
 kmem_cache_free+0xf4/0x5c0
 target_release_cmd_kref+0x3ea/0x9e0 [target_core_mod]
 transport_generic_free_cmd+0x28b/0x2f0 [target_core_mod]
 target_complete_ok_work+0x250/0xac0 [target_core_mod]
 process_one_work+0x7b6/0x1290
 worker_thread+0x590/0xf80
 kthread+0x362/0x430
 ret_from_fork+0x22/0x30

Last potentially related work creation:
 kasan_save_stack+0x1b/0x40
 kasan_record_aux_stack+0xa3/0xb0
 insert_work+0x48/0x2e0
 __queue_work+0x4e8/0xdf0
 queue_work_on+0x78/0x80
 tcmu_handle_completions+0xad0/0x1770 [target_core_user]
 tcmu_irqcontrol+0x28/0x60 [target_core_user]
 uio_write+0x155/0x230
 vfs_write+0x1ce/0x860
 ksys_write+0xe9/0x1b0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Second to last potentially related work creation:
 kasan_save_stack+0x1b/0x40
 kasan_record_aux_stack+0xa3/0xb0
 insert_work+0x48/0x2e0
 __queue_work+0x4e8/0xdf0
 queue_work_on+0x78/0x80
 tcm_loop_queuecommand+0x1c3/0x4e0 [tcm_loop]
 scsi_queue_rq+0x12ec/0x2d20
 blk_mq_dispatch_rq_list+0x30a/0x1db0
 __blk_mq_do_dispatch_sched+0x326/0x830
 __blk_mq_sched_dispatch_requests+0x2c8/0x3f0
 blk_mq_sched_dispatch_requests+0xca/0x120
 __blk_mq_run_hw_queue+0x93/0xe0
 process_one_work+0x7b6/0x1290
 worker_thread+0x590/0xf80
 kthread+0x362/0x430
 ret_from_fork+0x22/0x30

The buggy address belongs to the object at ffff88814cf79800 which belongs
to the cache tcm_loop_cmd_cache of size 896.

Link: https://lore.kernel.org/r/20210113024508.1264992-1-shinichiro.kawasaki@wdc.com
Fixes: a3512902 ("scsi: target: tcmu: Use priv pointer in se_cmd")
Cc: stable@vger.kernel.org # v5.9+
Acked-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

780e1384

30 10月, 2020 1 次提交

scsi: target: tcmu: Replace zero-length array with flexible-array member · 8fdaabe1

由 Gustavo A. R. Silva 提交于 8月 31, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arraysSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

8fdaabe1

27 10月, 2020 1 次提交

scsi: target: tcmu: scatter_/gather_data_area() rework · c8ed1ff8

由 Bodo Stroesser 提交于 10月 19, 2020

scatter_data_area() and gather_data_area() are not easy to understand since
data is copied in nested loops over sg_list and tcmu dbi list. Since sg
list can contain only partly filled pages, the loop has to be prepared to
handle sg pages not matching dbi pages one by one.

Existing implementation uses kmap_atomic()/kunmap_atomic() due to
performance reasons. But instead of using these calls strictly nested for
sg and dpi pages, the code holds the mappings in an overlapping way, which
indeed is a bug that would trigger on archs using highmem.

The scatterlist lib contains the sg_miter_start/_next/_stop functions which
can be used to simplify such complicated loops.

The new code now processes the dbi list in the outer loop, while sg list is
handled by the inner one. That way the code can take advantage of the
sg_miter_* family calls.

Calling sg_miter_stop() after the end of the inner loop enforces strict
nesting of atomic kmaps.

Since the nested loops in scatter_/gather_data_area were very similar, I
replaced them by the new helper function tcmu_copy_data().

Link: https://lore.kernel.org/r/20201019115118.11949-1-bostroesser@gmail.comAcked-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bostroesser@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c8ed1ff8

03 10月, 2020 2 次提交

genetlink: move to smaller ops wherever possible · 66a9b928

由 Jakub Kicinski 提交于 10月 02, 2020

Bulk of the genetlink users can use smaller ops, move them.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66a9b928

scsi: target: tcmu: Fix warning: 'page' may be used uninitialized · 61741d86

由 John Donnelly 提交于 9月 23, 2020

Corrects drivers/target/target_core_user.c:688:6: warning: 'page' may be
used uninitialized.

Link: https://lore.kernel.org/r/20200924001920.43594-1-john.p.donnelly@oracle.com
Fixes: 3c58f737 ("scsi: target: tcmu: Optimize use of flush_dcache_page")
Cc: Mike Christie <michael.christie@oracle.com>
Acked-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NJohn Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

61741d86

23 9月, 2020 3 次提交

scsi: target: tcmu: Optimize scatter_data_area() · 3c9a7c58

由 Bodo Stroesser 提交于 9月 10, 2020

scatter_data_area() has two purposes:

1) Create the iovs for the data area buffer of a SCSI cmd.

2) If there is data in DMA_TO_DEVICE direction, copy
the data from sg_list to data area buffer.

Both are done in a common loop.

In case of DMA_FROM_DEVICE data transfer, scatter_data_area() is called
with parameter copy_data = false. But this flag is just used to skip
memcpy() for data, while radix_tree_lookup still is called for every dbi of
the area area buffer, and kmap and kunmap are called for every page from
sg_list and data_area as well as flush_dcache_page() for the data area
pages. Since the only thing to do with copy_data = false would be to set
up the iovs, this is a noticeable overhead. Rework the iov creation in the
main loop of scatter_data_area() providing the new function
new_block_to_iov(). Based on this, create the short new function
tcmu_setup_iovs() that only writes the iovs with no overhead. This new
function is now called instead of scatter_data_area() for bidi buffers and
for data buffers in those cases where memcpy() would have been skipped.

Link: https://lore.kernel.org/r/20200910155041.17654-4-bstroesser@ts.fujitsu.comAcked-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3c9a7c58

scsi: target: tcmu: Optimize queue_cmd_ring() · 7e98905e

由 Bodo Stroesser 提交于 9月 10, 2020

queue_cmd_ring() needs to check whether there is enough space in cmd ring
and data area for the cmd to queue.

Currently the sequence is:

 1) Calculate size the cmd will occupy on the ring based on estimation of
    needed iovs.

 2) Check whether there is enough space on the ring based on size from 1)

 3) Allocate buffers in data area.

 4) Calculate number of iovs the command really needs while copying
    incoming data (if any) to data area.

 5) Re-calculate real size of cmd on ring based on real number of iovs.

 6) Set up possible padding and cmd on the ring.

Step 1) must not underestimate the cmd size so use max possible number of
iovs for the given I/O data size. The resulting overestimation can be
really high so this sequence is not ideal. The earliest the real number of
iovs can be calculated is after data buffer allocation. Therefore rework
the code to implement the following sequence:

 A) Allocate buffers on data area and calculate number of necessary iovs
    during this.

 B) Calculate real size of cmd on ring based on number of iovs.

 C) Check whether there is enough space on the ring.

 D) Set up possible padding and cmd on the ring.

The new sequence enforces the split of new function tcmu_alloc_data_space()
from is_ring_space_avail(). Using this function, change queue_cmd_ring()
according to the new sequence.

Change routines called by tcmu_alloc_data_space() to allow calculating and
returning the iov count. Remove counting of iovs in scatter_data_area().

Link: https://lore.kernel.org/r/20200910155041.17654-3-bstroesser@ts.fujitsu.comAcked-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7e98905e

scsi: target: tcmu: Join tcmu_cmd_get_data_length() and tcmu_cmd_get_block_cnt() · 52ef2743

由 Bodo Stroesser 提交于 9月 10, 2020

Simplify code by joining tcmu_cmd_get_data_length() and
tcmu_cmd_get_block_cnt() into tcmu_cmd_set_block_cnts(). The new function
sets tcmu_cmd->dbi_cnt and also the new field tcmu_cmd->dbi_bidi_cnt which
is needed for further enhancements in following patches. Simplify some
code by using tcmu_cmd->dbi(_bidi)_cnt instead of calculation from length.

Please note: The calculation of the number of dbis needed for bidi was
wrong. It was based on the length of the first bidi sg only. I changed it
to correctly sum up entire length of all bidi sgs.

Link: https://lore.kernel.org/r/20200910155041.17654-2-bstroesser@ts.fujitsu.comAcked-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

52ef2743

16 9月, 2020 1 次提交

scsi: target: tcmu: Add missing newline when printing parameters · 6d70cb34

由 Xiongfeng Wang 提交于 9月 03, 2020

The tcmu 'global_max_data_area_mb' parameter in sysfs is missing a
newline. Add it.

Link: https://lore.kernel.org/r/1599132573-33818-1-git-send-email-wangxiongfeng2@huawei.comSigned-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6d70cb34

29 7月, 2020 6 次提交

scsi: target: tcmu: Make TMR notification optional · 59526d7a

由 Bodo Stroesser 提交于 7月 26, 2020

Add "tmr_notification" configfs attribute to tcmu devices. If the default
value 0 is used, tcmu only removes aborted commands from qfull_queue. If
user changes tmr_notification to 1, additionally TMR notifications will be
written to the cmd ring.

Link: https://lore.kernel.org/r/20200726153510.13077-9-bstroesser@ts.fujitsu.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

59526d7a

scsi: target: tcmu: Implement tmr_notify callback · bc2d214a

由 Bodo Stroesser 提交于 7月 26, 2020

This patch implements the tmr_notify callback for tcmu.  When the callback
is called, tcmu checks the list of aborted commands it received as
parameter:

 - aborted commands in the qfull_queue are removed from the queue and
   target_complete_command is called

 - from the cmd_ids of aborted commands currently uncompleted in cmd ring
   it creates a list of aborted cmd_ids.

Finally a TMR notification is written to cmd ring containing TMR type and
cmd_id list. If there is no space in ring, the TMR notification is queued
on a TMR specific queue.

The TMR specific queue 'tmr_queue' can be seen as a extension of the cmd
ring. At the end of each iexecution of tcmu_complete_commands() we check
whether tmr_queue contains TMRs and try to move them onto the ring. If
tmr_queue is not empty after that, we don't call run_qfull_queue() because
commands must not overtake TMRs.

This way we guarantee that cmd_ids in TMR notification received by
userspace either match an active, not yet completed command or are no
longer valid due to userspace having complete some cmd_ids meanwhile.

New commands that were assigned to an aborted cmd_id will always appear on
the cmd ring _after_ the TMR.

Link: https://lore.kernel.org/r/20200726153510.13077-8-bstroesser@ts.fujitsu.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

bc2d214a

scsi: target: tcmu: Fix and simplify timeout handling · ed212ca8

由 Bodo Stroesser 提交于 7月 26, 2020

During cmd timeout handling in check_timedout_devices(), due to a race, it
can happen that tcmu_set_next_deadline() does not start a timer as
expected:

 1) Either tcmu_check_expired_ring_cmd() checks the inflight_queue or
    tcmu_check_expired_queue_cmd() checks the qfull_queue while jiffies has
    the value X

 2) At the end of the check the queue contains one remaining command with
    deadline X (time_after(X, X) is false and thus the command is not
    handled as being timed out).

 3) After tcmu_check_expired_xxxxx_cmd() a timer interrupt happens and
    jiffies is incremented to X+1.

 4) Now tcmu_set_next_deadline() is called, but it skips the command, since
    time_after(X+1, X) is true. Therefore tcmu_set_next_deadline() finds no
    new deadline and stops the timer, which it shouldn't.

Since commands that time out are removed from inflight_queue or
qfull_queue, we don't need the check with time_after() in
tcmu_set_next_deadline() but can use the deadline from the first cmd in
the queue.

Additionally, replace the remaining time_after() calls in
tcmu_check_expired_xxxxx_cmd() with time_after_eq(), because it is not
useful to set the timeout to deadline but then check for jiffies being
greater than deadline.

Simplify the end of tcmu_handle_completions() and change the check for no
more pending commands from

	mb->cmd_tail == mb->cmd_head
to

	idr_is_empty(&udev->commands)

because the old check doesn't work correctly if paddings or in the future
TMRs are in the ring.

Finally tcmu_set_next_deadline() was shifted in the source as
preparation for later implementation of tmr_notify callback.

Link: https://lore.kernel.org/r/20200726153510.13077-7-bstroesser@ts.fujitsu.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ed212ca8

scsi: target: tcmu: Factor out new helper ring_insert_padding · 3d3f9d56

由 Bodo Stroesser 提交于 7月 26, 2020

The new helper ring_insert_padding is split off from and then called by
queue_cmd_ring. It inserts a padding if necessary. The new helper will in
a subsequent patch be used during writing of TMR notifications to command
ring.

Link: https://lore.kernel.org/r/20200726153510.13077-6-bstroesser@ts.fujitsu.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3d3f9d56

scsi: target: tcmu: Do not queue aborted commands · c9684927

由 Bodo Stroesser 提交于 7月 26, 2020

If tcmu receives an already aborted command, tcmu_queue_cmd() should reject
it.

Link: https://lore.kernel.org/r/20200726153510.13077-5-bstroesser@ts.fujitsu.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c9684927

scsi: target: tcmu: Use priv pointer in se_cmd · a3512902

由 Bodo Stroesser 提交于 7月 26, 2020

We initialize and clean up the se_cmd's priv pointer under cmd_ring_lock to
point to the corresponding tcmu_cmd.

In the patch that implements tmr_notify callback in tcmu we will use the
priv pointer.

Link: https://lore.kernel.org/r/20200726153510.13077-4-bstroesser@ts.fujitsu.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NBodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a3512902

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功