提交 · 7a800a20ae6329e803c5c646b20811a6ae9ca136 · openeuler / Kernel

08 2月, 2021 3 次提交

block: use bi_max_vecs to find the bvec pool · 7a800a20

由 Christoph Hellwig 提交于 2月 02, 2021

Instead of encoding of the bvec pool using magic bio flags, just use
a helper to find the pool based on the max_vecs value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7a800a20

block: turn the nr_iovecs argument to bio_alloc* into an unsigned short · 0f2e6ab8

由 Christoph Hellwig 提交于 2月 02, 2021

The bi_max_vecs and bi_vcnt fields are defined as unsigned short, so
don't allow passing larger values in.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0f2e6ab8

block: move struct biovec_slab to bio.c · 6ac0b715

由 Christoph Hellwig 提交于 2月 02, 2021

struct biovec_slab is only used inside of bio.c, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6ac0b715

28 1月, 2021 3 次提交

mm: remove get_swap_bio · 48d15436

由 Christoph Hellwig 提交于 1月 26, 2021

Just reuse the block_device and sector from the swap_info structure,
just as used by the SWP_SYNCHRONOUS path.  Also remove the checks for
NULL returns from bio_alloc as that can't happen for sleeping
allocations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48d15436

block: use an on-stack bio in blkdev_issue_flush · c6bf3f0e

由 Christoph Hellwig 提交于 1月 26, 2021

There is no point in allocating memory for a synchronous flush.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6bf3f0e

block: split bio_kmalloc from bio_alloc_bioset · 3175199a

由 Christoph Hellwig 提交于 1月 26, 2021

bio_kmalloc shares almost no logic with the bio_set based fast path
in bio_alloc_bioset.  Split it into an entirely separate implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3175199a

27 1月, 2021 1 次提交

block: unexport truncate_bdev_range · 2c2b9fd6

由 Christoph Hellwig 提交于 1月 09, 2021

truncate_bdev_range is only used in always built-in block layer code,
so remove the export and the !CONFIG_BLOCK stub.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c2b9fd6

25 1月, 2021 15 次提交

bio: don't copy bvec for direct IO · c42bca92

由 Pavel Begunkov 提交于 1月 09, 2021

The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.
Suggested-by: NMatthew Wilcox <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c42bca92

bio: add a helper calculating nr segments to alloc · 3e1a88ec

由 Pavel Begunkov 提交于 1月 09, 2021

Add a helper function calculating the number of bvec segments we need to
allocate to construct a bio. It doesn't change anything functionally,
but will be used to not duplicate special cases in the future.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e1a88ec

block: remove unnecessary argument from blk_execute_rq · 684da762

由 Guoqing Jiang 提交于 1月 25, 2021

We can remove 'q' from blk_execute_rq as well after the previous change
in blk_execute_rq_nowait.

And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.

Cc: linux-scsi@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ide@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

684da762

block: remove unnecessary argument from blk_execute_rq_nowait · 8eeed0b5

由 Guoqing Jiang 提交于 1月 25, 2021

The 'q' is not used since commit a1ce35fa ("block: remove dead
elevator code"), also update the comment of the function.

And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.

Cc: target-devel@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ide@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8eeed0b5

block: move three bvec helpers declaration into private helper · eec716a1

由 Ming Lei 提交于 1月 11, 2021

bvec_alloc(), bvec_free() and bvec_nr_vecs() are only used inside block
layer core functions, no need to declare them in public header.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eec716a1

block: don't allocate inline bvecs if this bioset needn't bvecs · 9f180e31

由 Ming Lei 提交于 1月 11, 2021

The inline bvecs won't be used if user needn't bvecs by not passing
BIOSET_NEED_BVECS, so don't allocate bvecs in this situation.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Tested-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9f180e31

blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues · b6e68ee8

由 Jan Kara 提交于 1月 11, 2021

Currently when non-mq aware IO scheduler (BFQ, mq-deadline) is used for
a queue with multiple HW queues, the performance it rather bad. The
problem is that these IO schedulers use queue-wide locking and their
dispatch function does not respect the hctx it is passed in and returns
any request it finds appropriate. Thus locality of request access is
broken and dispatch from multiple CPUs just contends on IO scheduler
locks. For these IO schedulers there's little point in dispatching from
multiple CPUs. Instead dispatch always only from a single CPU to limit
contention.

Below is a comparison of dbench runs on XFS filesystem where the storage
is a raid card with 64 HW queues and to it attached a single rotating
disk. BFQ is used as IO scheduler:

clients MQ SQ MQ-Patched
Amean 1 39.12 (0.00%) 43.29 * -10.67%* 36.09 * 7.74%*
Amean 2 128.58 (0.00%) 101.30 * 21.22%* 96.14 * 25.23%*
Amean 4 577.42 (0.00%) 494.47 * 14.37%* 508.49 * 11.94%*
Amean 8 610.95 (0.00%) 363.86 * 40.44%* 362.12 * 40.73%*
Amean 16 391.78 (0.00%) 261.49 * 33.25%* 282.94 * 27.78%*
Amean 32 324.64 (0.00%) 267.71 * 17.54%* 233.00 * 28.23%*
Amean 64 295.04 (0.00%) 253.02 * 14.24%* 242.37 * 17.85%*
Amean 512 10281.61 (0.00%) 10211.16 * 0.69%* 10447.53 * -1.61%*

Numbers are times so lower is better. MQ is stock 5.10-rc6 kernel. SQ is
the same kernel with megaraid_sas.host_tagset_enable=0 so that the card
advertises just a single HW queue. MQ-Patched is a kernel with this
patch applied.

You can see multiple hardware queues heavily hurt performance in
combination with BFQ. The patch restores the performance.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6e68ee8

Revert "blk-mq, elevator: Count requests per hctx to improve performance" · 5ac83c64

由 Jan Kara 提交于 1月 11, 2021

This reverts commit b445547e.

Since both mq-deadline and BFQ completely ignore hctx they are passed to
their dispatch function and dispatch whatever request they deem fit
checking whether any request for a particular hctx is queued is just
pointless since we'll very likely get a request from a different hctx
anyway. In the following commit we'll deal with lock contention in these
IO schedulers in presence of multiple HW queues in a different way.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5ac83c64

block: use an xarray for disk->part_tbl · a33df75c

由 Christoph Hellwig 提交于 1月 24, 2021

Now that no fast path lookups in the partition table are left, there is
no point in micro-optimizing the data structure for it.  Just use a bog
standard xarray.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a33df75c

block: remove DISK_PITER_REVERSE · 0470dd9d

由 Christoph Hellwig 提交于 1月 24, 2021

There is good reason to iterate backwards when deleting all partitions in
del_gendisk, just like we don't in blk_drop_partitions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0470dd9d

block: add a disk_uevent helper · bc359d03

由 Christoph Hellwig 提交于 1月 24, 2021

Add a helper to call kobject_uevent for the disk and all partitions, and
unexport the disk_part_iter_* helpers that are now only used in the core
block code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc359d03

block: use ->bi_bdev for bio based I/O accounting · 99dfc43e

由 Christoph Hellwig 提交于 1月 24, 2021

Rework the I/O accounting for bio based drivers to use ->bi_bdev.  This
means all drivers can now simply use bio_start_io_acct to start
accounting, and it will take partitions into account automatically.  To
end I/O account either bio_end_io_acct can be used if the driver never
remaps I/O to a different device, or bio_end_io_acct_remapped if the
driver did remap the I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

99dfc43e

block: do not reassig ->bi_bdev when partition remapping · 30c5d345

由 Christoph Hellwig 提交于 1月 24, 2021

There is no good reason to reassign ->bi_bdev when remapping the
partition-relative block number to the device wide one, as all the
information required by the drivers comes from the gendisk anyway.

Keeping the original ->bi_bdev alive will allow to greatly simplify
the partition-away I/O accounting.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

30c5d345

block: store a block_device pointer in struct bio · 309dca30

由 Christoph Hellwig 提交于 1月 24, 2021

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device.  From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

309dca30

block: add a hard-readonly flag to struct gendisk · 52f019d4

由 Christoph Hellwig 提交于 1月 09, 2021

Commit 20bd1d02 ("scsi: sd: Keep disk read-only when re-reading
partition") addressed a long-standing problem with user read-only
policy being overridden as a result of a device-initiated revalidate.
The commit has since been reverted due to a regression that left some
USB devices read-only indefinitely.

To fix the underlying problems with revalidate we need to keep track
of hardware state and user policy separately.

The gendisk has been updated to reflect the current hardware state set
by the device driver. This is done to allow returning the device to
the hardware state once the user clears the BLKROSET flag.

The resulting semantics are as follows:

- If BLKROSET sets a given partition read-only, that partition will
remain read-only even if the underlying storage stack initiates a
revalidate. However, the BLKRRPART ioctl will cause the partition
table to be dropped and any user policy on partitions will be lost.

- If BLKROSET has not been set, both the whole disk device and any
partitions will reflect the current write-protect state of the
underlying device.

Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>.
Reported-by: NOleksii Kurochko <olkuroch@cisco.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

52f019d4

22 1月, 2021 2 次提交

kthread: Extract KTHREAD_IS_PER_CPU · ac687e6e

由 Peter Zijlstra 提交于 1月 12, 2021

There is a need to distinguish geniune per-cpu kthreads from kthreads
that happen to have a single CPU affinity.

Geniune per-cpu kthreads are kthreads that are CPU affine for
correctness, these will obviously have PF_KTHREAD set, but must also
have PF_NO_SETAFFINITY set, lest userspace modify their affinity and
ruins things.

However, these two things are not sufficient, PF_NO_SETAFFINITY is
also set on other tasks that have their affinities controlled through
other means, like for instance workqueues.

Therefore another bit is needed; it turns out kthread_create_per_cpu()
already has such a bit: KTHREAD_IS_PER_CPU, which is used to make
kthread_park()/kthread_unpark() work correctly.

Expose this flag and remove the implicit setting of it from
kthread_create_on_cpu(); the io_uring usage of it seems dubious at
best.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Tested-by: NValentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org

ac687e6e

driver core: Fix device link device name collision · e020ff61

由 Saravana Kannan 提交于 1月 10, 2021

The device link device's name was of the form:
<supplier-dev-name>--<consumer-dev-name>

This can cause name collision as reported here [1] as device names are
not globally unique. Since device names have to be unique within the
bus/class, add the bus/class name as a prefix to the device names used to
construct the device link device name.

So the devuce link device's name will be of the form:
<supplier-bus-name>:<supplier-dev-name>--<consumer-bus-name>:<consumer-dev-name>

[1] - https://lore.kernel.org/lkml/20201229033440.32142-1-michael@walle.cc/

Fixes: 287905e6 ("driver core: Expose device link details in sysfs")
Cc: stable@vger.kernel.org
Reported-by: NMichael Walle <michael@walle.cc>
Tested-by: NMichael Walle <michael@walle.cc>
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210110175408.1465657-1-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e020ff61

21 1月, 2021 1 次提交

net: usb: cdc_ncm: don't spew notifications · de658a19

由 Grant Grundler 提交于 1月 19, 2021

RTL8156 sends notifications about every 32ms.
Only display/log notifications when something changes.

This issue has been reported by others:
	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
	https://lkml.org/lkml/2020/8/27/1083

...
[785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
[785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
[785962.929954] usb 1-1: Manufacturer: Realtek
[785962.929956] usb 1-1: SerialNumber: 000000001
[785962.991755] usbcore: registered new interface driver cdc_ether
[785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
[785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
[785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
[785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
[785963.019211] usbcore: registered new interface driver cdc_ncm
[785963.023856] usbcore: registered new interface driver cdc_wdm
[785963.025461] usbcore: registered new interface driver cdc_mbim
[785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
[785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
...

This is about 2KB per second and will overwrite all contents of a 1MB
dmesg buffer in under 10 minutes rendering them useless for debugging
many kernel problems.

This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
the majority of those logs useless too.

When the link is up (expected state), spew amount is >2x higher:
...
[786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
...

Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: NGrant Grundler <grundler@chromium.org>
Reviewed-by: NHayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

de658a19

20 1月, 2021 2 次提交

tcp: fix TCP socket rehash stats mis-accounting · 9c30ae83

由 Yuchung Cheng 提交于 1月 19, 2021

The previous commit 32efcc06 ("tcp: export count for rehash attempts")
would mis-account rehashing SNMP and socket stats:

  a. During handshake of an active open, only counts the first
     SYN timeout

  b. After handshake of passive and active open, stop updating
     after (roughly) TCP_RETRIES1 recurring RTOs

  c. After the socket aborts, over count timeout_rehash by 1

This patch fixes this by checking the rehash result from sk_rethink_txhash.

Fixes: 32efcc06 ("tcp: export count for rehash attempts")
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

9c30ae83

mdio-bitbang: Export mdiobb_{read,write}() · 8eed01b5

由 Geert Uytterhoeven 提交于 1月 18, 2021

Export mdiobb_read() and mdiobb_write(), so Ethernet controller drivers
can call them from their MDIO read/write wrappers.
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Tested-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8eed01b5

19 1月, 2021 2 次提交

tcp: fix TCP_USER_TIMEOUT with zero window · 9d9b1ee0

由 Enke Chen 提交于 1月 15, 2021

The TCP session does not terminate with TCP_USER_TIMEOUT when data
remain untransmitted due to zero window.

The number of unanswered zero-window probes (tcp_probes_out) is
reset to zero with incoming acks irrespective of the window size,
as described in tcp_probe_timer():

    RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
    as long as the receiver continues to respond probes. We support
    this by default and reset icsk_probes_out with incoming ACKs.

This counter, however, is the wrong one to be used in calculating the
duration that the window remains closed and data remain untransmitted.
Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the
actual issue.

In this patch a new timestamp is introduced for the socket in order to
track the elapsed time for the zero-window probes that have not been
answered with any non-zero window ack.

Fixes: 9721e709 ("tcp: simplify window probe aborting on USER_TIMEOUT")
Reported-by: NWilliam McCall <william.mccall@gmail.com>
Co-developed-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEnke Chen <enchen@paloaltonetworks.com>
Reviewed-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomainSigned-off-by: NJakub Kicinski <kuba@kernel.org>

9d9b1ee0

nvme-pci: allow use of cmb on v1.4 controllers · 20d3bb92

由 Klaus Jensen 提交于 1月 15, 2021

Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled
by the host.
Signed-off-by: NKlaus Jensen <k.jensen@samsung.com>
[hch: avoid a local variable and add a comment]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

20d3bb92

15 1月, 2021 2 次提交

compiler.h: Raise minimum version of GCC to 5.1 for arm64 · dca5244d

由 Will Deacon 提交于 1月 12, 2021

GCC versions >= 4.9 and < 5.1 have been shown to emit memory references
beyond the stack pointer, resulting in memory corruption if an interrupt
is taken after the stack pointer has been adjusted but before the
reference has been executed. This leads to subtle, infrequent data
corruption such as the EXT4 problems reported by Russell King at the
link below.

Life is too short for buggy compilers, so raise the minimum GCC version
required by arm64 to 5.1.
Reported-by: NRussell King <linux@armlinux.org.uk>
Suggested-by: NArnd Bergmann <arnd@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>
Tested-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

dca5244d

cfg80211: fix a kerneldoc markup · c2083e28

由 Mauro Carvalho Chehab 提交于 1月 14, 2021

A function has a different name between their prototype
and its kernel-doc markup:
	../include/net/cfg80211.h:1766: warning: expecting prototype for struct cfg80211_sar_chan_ranges. Prototype was for struct cfg80211_sar_freq_ranges instead
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/c7ed4bc4d9e992ead16d3d2df246f3b56dbfb1fb.1610610937.git.mchehab+huawei@kernel.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

c2083e28

13 1月, 2021 7 次提交

xen: Fix event channel callback via INTX/GSI · 3499ba81

由 David Woodhouse 提交于 1月 13, 2021

For a while, event channel notification via the PCI platform device
has been broken, because we attempt to communicate with xenstore before
we even have notifications working, with the xs_reset_watches() call
in xs_init().

We tend to get away with this on Xen versions below 4.0 because we avoid
calling xs_reset_watches() anyway, because xenstore might not cope with
reading a non-existent key. And newer Xen *does* have the vector
callback support, so we rarely fall back to INTX/GSI delivery.

To fix it, clean up a bit of the mess of xs_init() and xenbus_probe()
startup. Call xs_init() directly from xenbus_init() only in the !XS_HVM
case, deferring it to be called from xenbus_probe() in the XS_HVM case
instead.

Then fix up the invocation of xenbus_probe() to happen either from its
device_initcall if the callback is available early enough, or when the
callback is finally set up. This means that the hack of calling
xenbus_probe() from a workqueue after the first interrupt, or directly
from the PCI platform device setup, is no longer needed.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210113132606.422794-2-dwmw2@infradead.orgSigned-off-by: NJuergen Gross <jgross@suse.com>

3499ba81

arm64: make atomic helpers __always_inline · c35a824c

由 Arnd Bergmann 提交于 1月 08, 2021

With UBSAN enabled and building with clang, there are occasionally
warnings like

WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed
The function arch_atomic64_or() references
the variable __initdata numa_nodes_parsed.
This is often because arch_atomic64_or lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.

for functions that end up not being inlined as intended but operating
on __initdata variables. Mark these as __always_inline, along with
the corresponding asm-generic wrappers.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210108092024.4034860-1-arnd@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

c35a824c

Revert "arm64: Enable perf events based hard lockup detector" · b90d72a6

由 Will Deacon 提交于 1月 12, 2021

This reverts commit 367c820e.

lockup_detector_init() makes heavy use of per-cpu variables and must be
called with preemption disabled. Usually, it's handled early during boot
in kernel_init_freeable(), before SMP has been initialised.

Since we do not know whether or not our PMU interrupt can be signalled
as an NMI until considerably later in the boot process, the Arm PMU
driver attempts to re-initialise the lockup detector off the back of a
device_initcall(). Unfortunately, this is called from preemptible
context and results in the following splat:

  | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
  | caller is debug_smp_processor_id+0x20/0x2c
  | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |   dump_backtrace+0x0/0x3c0
  |   show_stack+0x20/0x6c
  |   dump_stack+0x2f0/0x42c
  |   check_preemption_disabled+0x1cc/0x1dc
  |   debug_smp_processor_id+0x20/0x2c
  |   hardlockup_detector_event_create+0x34/0x18c
  |   hardlockup_detector_perf_init+0x2c/0x134
  |   watchdog_nmi_probe+0x18/0x24
  |   lockup_detector_init+0x44/0xa8
  |   armv8_pmu_driver_init+0x54/0x78
  |   do_one_initcall+0x184/0x43c
  |   kernel_init_freeable+0x368/0x380
  |   kernel_init+0x1c/0x1cc
  |   ret_from_fork+0x10/0x30

Rather than bodge this with raw_smp_processor_id() or randomly disabling
preemption, simply revert the culprit for now until we figure out how to
do this properly.
Reported-by: NLecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com
Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

b90d72a6

SUNRPC: Move the svc_xdr_recvfrom tracepoint again · 5f39d271

由 Chuck Lever 提交于 1月 03, 2021

Commit 156708ad ("SUNRPC: Move the svc_xdr_recvfrom()
tracepoint") tried to capture the correct XID in the trace record,
but this line in svc_recv:

	rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]);

alters the size of rq_arg.head[0].iov_len. The tracepoint records
the correct XID but an incorrect value for the length of the
xdr_buf's head.

To keep the trace callsites simple, I've created two trace classes.
One assumes the xdr_buf contains a full RPC message, and the XID
can be extracted from it. The other assumes the contents of the
xdr_buf are arbitrary, and the xid will be provided by the caller.

Currently there is only one user of each class, but I expect we will
need a few more tracepoints using each class as time goes on.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

5f39d271

arm/kasan: fix the array size of kasan_early_shadow_pte[] · 29970dc2

由 Hailong Liu 提交于 1月 12, 2021

The size of kasan_early_shadow_pte[] now is PTRS_PER_PTE which defined
to 512 for arm.  This means that it only covers the prev Linux pte
entries, but not the HWTABLE pte entries for arm.

The reason it currently works is that the symbol kasan_early_shadow_page
immediately following kasan_early_shadow_pte in memory is page aligned,
which makes kasan_early_shadow_pte look like a 4KB size array.  But we
can't ensure the order is always right with different compiler/linker,
or if more bss symbols are introduced.

We had a test with QEMU + vexpress：put a 512KB-size symbol with
attribute __section(".bss..page_aligned") after kasan_early_shadow_pte,
and poisoned it after kasan_early_init().  Then enabled CONFIG_KASAN, it
failed to boot up.

Link: https://lkml.kernel.org/r/20210109044622.8312-1-hailongliiu@yeah.netSigned-off-by: NHailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: NZiliang Guo <guo.ziliang@zte.com.cn>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29970dc2

mm/memcontrol: fix warning in mem_cgroup_page_lruvec() · 7ea510b9

由 Hugh Dickins 提交于 1月 12, 2021

Boot a CONFIG_MEMCG=y kernel with "cgroup_disabled=memory" and you are
met by a series of warnings from the VM_WARN_ON_ONCE_PAGE(!memcg, page)
recently added to the inline mem_cgroup_page_lruvec().

An earlier attempt to place that warning, in mem_cgroup_lruvec(), had
been careful to do so after weeding out the mem_cgroup_disabled() case;
but was itself invalid because of the mem_cgroup_lruvec(NULL, pgdat) in
clear_pgdat_congested() and age_active_anon().

Warning in mem_cgroup_page_lruvec() was once useful in detecting a KSM
charge bug, so may be worth keeping: but skip if mem_cgroup_disabled().

Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101032056260.1093@eggly.anvils
Fixes: 9a1ac228 ("mm/memcontrol:rewrite mem_cgroup_page_lruvec()")
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NChris Down <chris@chrisdown.name>
Reviewed-by: NBaoquan He <bhe@redhat.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Hui Su <sh_def@163.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ea510b9

timekeeping: Remove unused get_seconds() · aba428a0

由 Chunguang Xu 提交于 12月 01, 2020

The get_seconds() cleanup seems to have been completed, now it is
time to delete the legacy interface to avoid misuse later.
Signed-off-by: NChunguang Xu <brookxu@tencent.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1606816351-26900-1-git-send-email-brookxu@tencent.com

aba428a0

12 1月, 2021 2 次提交

net: compound page support in skb_seq_read · 97550f6f

由 Willem de Bruijn 提交于 1月 09, 2021

skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.

It relies on kmap_atomic to access highmem pages when needed.

An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.

Instead, if kmap_atomic is needed, iterate over each page.

As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.

I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.

Tested:
  On an x86 platform with
    CONFIG_HIGHMEM=y
    CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
    CONFIG_NETFILTER_XT_MATCH_STRING=y

  Run
    ip link set dev lo mtu 1500
    iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
    dd if=/dev/urandom of=in bs=1M count=20
    nc -l -p 8000 > /dev/null &
    nc -w 1 -q 0 localhost 8000 < in
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

97550f6f

net: support kmap_local forced debugging in skb_frag_foreach · 29766bcf

由 Willem de Bruijn 提交于 1月 09, 2021

Skb frags may be backed by highmem and/or compound pages. Highmem
pages need kmap_atomic mappings to access. But kmap_atomic maps a
single page, not the entire compound page.

skb_foreach_page iterates over an skb frag, in one step in the common
case, page by page only if kmap_atomic must be called for each page.
The decision logic is captured in skb_frag_must_loop.

CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP extends kmap from highmem to all
pages, to increase code coverage.

Extend skb_frag_must_loop to this new condition.

Link: https://lore.kernel.org/linux-mm/20210106180132.41dc249d@gandalf.local.home/
Fixes: 0e91a0c6 ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Reported-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

29766bcf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功