提交 · 9a6083becbe113ed1e28059ce659dc8ae71b33c3 · openeuler / Kernel

18 10月, 2021 24 次提交

block: move bio_full out of bio.h · 9a6083be

由 Christoph Hellwig 提交于 10月 12, 2021

bio_full is only used in bio.c, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9a6083be

block: fold bio_cur_bytes into blk_rq_cur_bytes · b6559d8f

由 Christoph Hellwig 提交于 10月 12, 2021

Fold bio_cur_bytes into the only caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b6559d8f

block: move bio_mergeable out of bio.h · 8addffd6

由 Christoph Hellwig 提交于 10月 12, 2021

bio_mergeable is only needed by I/O schedulers, so move it to
blk-mq-sched.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

8addffd6

block: don't include <linux/ioprio.h> in <linux/bio.h> · 11d9cab1

由 Christoph Hellwig 提交于 10月 12, 2021

bio.h doesn't need any of the definitions from ioprio.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

11d9cab1

block: remove BIO_BUG_ON · 9e8c0d0d

由 Christoph Hellwig 提交于 10月 12, 2021

BIO_DEBUG is always defined, so just switch the two instances to use
BUG_ON directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9e8c0d0d

block: move the *blkdev_ioctl declarations out of blkdev.h · 84b8514b

由 Christoph Hellwig 提交于 10月 12, 2021

These are only used inside of block/.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

84b8514b

block: pre-allocate requests if plug is started and is a batch · 47c122e3

由 Jens Axboe 提交于 10月 06, 2021

The caller typically has a good (or even exact) idea of how many requests
it needs to submit. We can make the request/tag allocation a lot more
efficient if we just allocate N requests/tags upfront when we queue the
first bio from the batch.

Provide a new plug start helper that allows the caller to specify how many
IOs are expected. This sets plug->nr_ios, and we can use that for smarter
request allocation. The plug provides a holding spot for requests, and
request allocation will check it before calling into the normal request
allocation path.

The blk_finish_plug() is called, check if there are unused requests and
free them. This should not happen in normal operations. The exception is
if we get merging, then we may be left with requests that need freeing
when done.

This raises the per-core performance on my setup from ~5.8M to ~6.1M
IOPS.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

47c122e3

block: bump max plugged deferred size from 16 to 32 · ba0ffdd8

由 Jens Axboe 提交于 10月 06, 2021

Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.

Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.

While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ba0ffdd8

blk-mq: Change shared sbitmap naming to shared tags · 079a2e3e

由 John Garry 提交于 10月 05, 2021

Now that shared sbitmap support really means shared tags, rename symbols
to match that.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

079a2e3e

blk-mq: Use shared tags for shared sbitmap support · e155b0c2

由 John Garry 提交于 10月 05, 2021

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e155b0c2

block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ · d2a27964

由 John Garry 提交于 10月 05, 2021

It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
the name BLKDEV_MAX_RQ would imply the max requests always, which it is
not.

Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
the default number of requests assigned when allocating a request queue.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d2a27964

block: move struct request to blk-mq.h · 24b83deb

由 Christoph Hellwig 提交于 9月 20, 2021

struct request is only used by blk-mq drivers, so move it and all
related declarations to blk-mq.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

24b83deb

block: move integrity handling out of <linux/blkdev.h> · fe45e630

由 Christoph Hellwig 提交于 9月 20, 2021

Split the integrity/metadata handling definitions out into a new header.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

fe45e630

block: move a few merge helpers out of <linux/blkdev.h> · badf7f64

由 Christoph Hellwig 提交于 9月 20, 2021

These are block-layer internal helpers, so move them to block/blk.h and
block/blk-merge.c. Also update a comment a bit to use better grammar.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

badf7f64

block: drop unused includes in <linux/genhd.h> · b81e0c23

由 Christoph Hellwig 提交于 9月 20, 2021

Drop various include not actually used in genhd.h itself, and
move the remaning includes closer together.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b81e0c23

block: drop unused includes in <linux/blkdev.h> · 3ab0bc78

由 Christoph Hellwig 提交于 9月 20, 2021

Drop various include not actually used in blkdev.h itself.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3ab0bc78

block: move elevator.h to block/ · 2e9bc346

由 Christoph Hellwig 提交于 9月 20, 2021

Except for the features passed to blk_queue_required_elevator_features,
elevator.h is only needed internally to the block layer. Move the
ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to
block/, dropping all the spurious includes outside of that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

2e9bc346

block: remove the struct blk_queue_ctx forward declaration · 9778ac77

由 Christoph Hellwig 提交于 9月 20, 2021

This type doesn't exist at all, so no need to forward declare it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9778ac77

block: remove the cmd_size field from struct request_queue · 713e4e11

由 Christoph Hellwig 提交于 9月 20, 2021

Entirely unused.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

713e4e11

block: remove the unused blk_queue_state enum · 90138237

由 Christoph Hellwig 提交于 9月 20, 2021

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

90138237

block: remove the unused rq_end_sector macro · 1d9433cd

由 Christoph Hellwig 提交于 9月 20, 2021

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1d9433cd

mm: don't include <linux/blkdev.h> in <linux/backing-dev.h> · ccdf7741

由 Christoph Hellwig 提交于 9月 20, 2021

Move inode_to_bdi out of line to avoid having to include blkdev.h.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ccdf7741

mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h> · e41d12f5

由 Christoph Hellwig 提交于 9月 20, 2021

There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e41d12f5

mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h> · 348332e0

由 Christoph Hellwig 提交于 9月 20, 2021

blk-cgroup.h pulls in blkdev.h and thus pretty much all the block
headers. Break this dependency chain by turning wbc_blkcg_css into a
macro and dropping the blk-cgroup.h include.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

348332e0

16 10月, 2021 1 次提交

block: drain file system I/O on del_gendisk · 8e141f9e

由 Christoph Hellwig 提交于 9月 29, 2021

Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called. This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.

Fixes: edb0872f ("block: move the bdi from the request_queue to the gendisk")
Reported-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e141f9e

14 10月, 2021 1 次提交

spi: Fix deadlock when adding SPI controllers on SPI buses · 6098475d

由 Mark Brown 提交于 10月 08, 2021

Currently we have a global spi_add_lock which we take when adding new
devices so that we can check that we're not trying to reuse a chip
select that's already controlled. This means that if the SPI device is
itself a SPI controller and triggers the instantiation of further SPI
devices we trigger a deadlock as we try to register and instantiate
those devices while in the process of doing so for the parent controller
and hence already holding the global spi_add_lock. Since we only care
about concurrency within a single SPI bus move the lock to be per
controller, avoiding the deadlock.

This can be easily triggered in the case of spi-mux.
Reported-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NMark Brown <broonie@kernel.org>

6098475d

13 10月, 2021 5 次提交

net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib · 49f885b2

由 Vladimir Oltean 提交于 10月 12, 2021

Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.

This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.

The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.

We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...

We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.

Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Reported-by: NMichael Walle <michael@walle.cc>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

49f885b2

net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver · deab6b1c

由 Vladimir Oltean 提交于 10月 12, 2021

As explained here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocol drivers cannot depend on symbols exported by switch
drivers, because this creates a circular dependency that breaks module
autoloading.

The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
exported by the common ocelot switch lib. This function looks at
OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
DSA tag, for PTP timestamping (the command: one-step/two-step, and the
TX timestamp identifier).

None of that requires deep insight into the driver, it is quite
stateless, as it only depends upon the skb->cb. So let's make it a
static inline function and put it in include/linux/dsa/ocelot.h, a
file that despite its name is used by the ocelot switch driver for
populating the injection header too - since commit 40d3f295 ("net:
mscc: ocelot: use common tag parsing code with DSA").

With that function declared as static inline, its body is expanded
inside each call site, so the dependency is broken and the DSA tagger
can be built without the switch library, upon which the felix driver
depends.

Fixes: 39e5308b ("net: mscc: ocelot: support PTP Sync one-step timestamping")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

deab6b1c

net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e

由 Vladimir Oltean 提交于 9月 22, 2021

It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.

The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.

Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.

Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.

Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4ac0567e

net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555

由 Vladimir Oltean 提交于 9月 22, 2021

The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.

The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.

So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).

On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.

DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.

When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.

The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.

To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).

With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.

Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

28da0555

net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp · 0bc73ad4

由 Aya Levin 提交于 9月 26, 2021

Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.

Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

0bc73ad4

09 10月, 2021 1 次提交

net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports · 5bded825

由 Vladimir Oltean 提交于 10月 07, 2021

Similar to commit 6087175b ("net: dsa: mt7530: use independent VLAN
learning on VLAN-unaware bridges"), software forwarding between an
unoffloaded LAG port (a bonding interface with an unsupported policy)
and a mv88e6xxx user port directly under a bridge is broken.

We adopt the same strategy, which is to make the standalone ports not
find any ATU entry learned on a bridge port.

Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
as many FIDs as VIDs (4096). The FID is derived from the VID when
possible (the VTU maps a VID to a FID), with a fallback to the port
based default FID value when not (802.1Q Mode is disabled on the port,
or the classified VID isn't present in the VTU).

The mv88e6xxx driver makes the following use of FIDs and VIDs:

- the port's DefaultVID (to which untagged & pvid-tagged packets get
  classified) is 0 and is absent from the VTU, so this kind of packets is
  processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.

- every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
  mv88e6xxx_atu_new() associates a FID with that VID which increases
  linearly starting from 1. Like this:

  bridge vlan add dev lan0 vid 100 # FID 1
  bridge vlan add dev lan1 vid 100 # still FID 1
  bridge vlan add dev lan2 vid 1024 # FID 2

The FID allocation made by the driver is sub-optimal for the following
reasons:

(a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
    A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
    of 0 too. The difference is that the bridged ports may learn ATU
    entries, while the standalone port has the requirement that it must
    not, and must not find them either. Standalone ports must not use
    the same FID as ports belonging to a bridge. All standalone ports
    can use the same FID, since the ATU will never have an entry in
    that FID.

(b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
    default FID of 0 on all their ports. The FDBs will not be isolated
    between these bridges. Every VLAN-unaware bridge must use the same
    FID on all its ports, different from the FID of other bridge ports.

(c) Each bridge VLAN uses a unique FID which is useful for Independent
    VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
    will result in the same FID being used by mv88e6xxx_atu_new().
    The correct behavior is for VLAN 1 in br0 to have a different FID
    compared to VLAN 1 in br1.

This patch cannot fix all the above. Traditionally the DSA framework did
not care about this, and the reality is that DSA core involvement is
needed for the aforementioned issues to be solved. The only thing we can
solve here is an issue which does not require API changes, and that is
issue (a), aka use a different FID for standalone ports vs ports under
VLAN-unaware bridges.

The first step is deciding what VID and FID to use for standalone ports,
and what VID and FID for bridged ports. The 0/0 pair for standalone
ports is what they used up till now, let's keep using that. For bridged
ports, there are 2 cases:

- VLAN-aware ports will never end up using the port default FID, because
  packets will always be classified to a VID in the VTU or dropped
  otherwise. The FID is the one associated with the VID in the VTU.

- On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
  zero (just as in the case of standalone ports), and just change the
  port's default FID from 0 to a different number (say 1).

However, Tobias points out that there is one more requirement to cater to:
cross-chip bridging. The Marvell DSA header does not carry the FID in
it, only the VID. So once a packet crosses a DSA link, if it has a VID
of zero it will get classified to the default FID of that cascade port.
Relying on a port default FID for upstream cascade ports results in
contradictions: a default FID of 0 breaks ATU isolation of bridged ports
on the downstream switch, a default FID of 1 breaks standalone ports on
the downstream switch.

So not only must standalone ports have different FIDs compared to
bridged ports, they must also have different DefaultVID values.
IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
bridges, and VID 4095 maps to FID 1.

For the xmit operation to look up the same ATU database, we need to put
VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
too. All shared ports are configured to map this VID to the bridging
FID, because they are members of that VLAN in the VTU. Shared ports
don't need to have 802.1QMode enabled in any way, they always parse the
VID from the DSA header, they don't need to look at the 802.1Q header.

We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
mention that mv88e6xxx_vtu_setup() which was located right below that
call was flushing the VTU so those entries wouldn't be preserved.
So we need to relocate the VTU flushing prior to the port initialization
during ->setup(). Also note that this is why it is safe to assume that
VID 4095 will get associated with FID 1: the user ports haven't been
created, so there is no avenue for the user to create a bridge VLAN
which could otherwise race with the creation of another FID which would
otherwise use up the non-reserved FID value of 1.

[ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
  specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]

mv88e6xxx_port_db_load_purge() is the function to access the ATU for
FDB/MDB entries, and it used to determine the FID to use for
VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
But the driver only called mv88e6xxx_port_set_fid() once, during probe,
so no surprises, the port FID was always 0, the call to get_fid() was
redundant. As much as I would have wanted to not touch that code, the
logic is broken when we add a new FID which is not the port-based
default. Now the port-based default FID only corresponds to standalone
ports, and FDB/MDB entries belong to the bridging service. So while in
the future, when the DSA API will support FDB isolation, we will have to
figure out the FID based on the bridge number, for now there's a single
bridging FID, so hardcode that.

Lastly, the tagger needs to check, when it is transmitting a VLAN
untagged skb, whether it is sending it towards a bridged or a standalone
port. When we see it is bridged we assume the bridge is VLAN-unaware.
Not because it cannot be VLAN-aware but:

- if we are transmitting from a VLAN-aware bridge we are likely doing so
  using TX forwarding offload. That code path guarantees that skbs have
  a vlan hwaccel tag in them, so we would not enter the "else" branch
  of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.

- if we are transmitting on behalf of a VLAN-aware bridge but with no TX
  forwarding offload (no PVT support, out of space in the PVT, whatever),
  we would indeed be transmitting with VLAN 4095 instead of the bridge
  device's pvid. However we would be injecting a "From CPU" frame, and
  the switch won't learn from that - it only learns from "Forward" frames.
  So it is inconsequential for address learning. And VLAN 4095 is
  absolutely enough for the frame to exit the switch, since we never
  remove that VLAN from any port.

Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

5bded825

07 10月, 2021 1 次提交

qcom_scm: hide Kconfig symbol · 424953cf

由 Arnd Bergmann 提交于 9月 28, 2021

Now that SCM can be a loadable module, we have to add another
dependency to avoid link failures when ipa or adreno-gpu are
built-in:

aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe':
ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available'

ld.lld: error: undefined symbol: qcom_scm_is_available
>>> referenced by adreno_gpu.c
>>>               gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in archive drivers/built-in.a

This can happen when CONFIG_ARCH_QCOM is disabled and we don't select
QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd
use a similar dependency here to what we have for QCOM_RPROC_COMMON,
but that causes dependency loops from other things selecting QCOM_SCM.

This appears to be an endless problem, so try something different this
time:

 - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on'
   but that is simply selected by all of its users

 - All the stubs in include/linux/qcom_scm.h can go away

 - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to
   allow compile-testing QCOM_SCM on all architectures.

 - To avoid a circular dependency chain involving RESET_CONTROLLER
   and PINCTRL_SUNXI, drop the 'select RESET_CONTROLLER' statement.
   According to my testing this still builds fine, and the QCOM
   platform selects this symbol already.
Acked-by: NKalle Valo <kvalo@codeaurora.org>
Acked-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

424953cf

05 10月, 2021 2 次提交

ARM: omap1: move omap15xx local bus handling to usb.c · 94ad8aac

由 Arnd Bergmann 提交于 9月 27, 2021

Commit 38225f2e ("ARM/omap1: switch to use dma_direct_set_offset for
lbus DMA offsets") removed a lot of mach/memory.h, but left the USB
offset handling split into arch/arm/mach-omap1/usb.c and
drivers/usb/host/ohci-omap.c.

This can cause a randconfig build warning that now fails the build
with -Werror:

arch/arm/mach-omap1/usb.c:561:30: error: 'omap_1510_usb_ohci_nb' defined but not used [-Werror=unused-variable]
  561 | static struct notifier_block omap_1510_usb_ohci_nb = {
      |                              ^~~~~~~~~~~~~~~~~~~~~

Move it all into the platform file to get rid of the final
location that relies on mach/memory.h.
Acked-by: NFelipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: NTony Lindgren <tony@atomide.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210927144118.2464881-1-arnd@kernel.org'
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

94ad8aac

etherdevice: use __dev_addr_set() · 3f6cffb8

由 Jakub Kicinski 提交于 10月 04, 2021

Andrew points out that eth_hw_addr_set() replaces memcpy()
calls so we can't use ether_addr_copy() which assumes
both arguments are 2-bytes aligned.
Reported-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f6cffb8

01 10月, 2021 2 次提交

sched: Always inline is_percpu_thread() · 83d40a61

由 Peter Zijlstra 提交于 9月 20, 2021

vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.org

83d40a61

perf/core: fix userpage->time_enabled of inactive events · f7925653

由 Song Liu 提交于 9月 29, 2021

Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage->time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage->time_enabled. This can
be reproduced with something like:

   /* open 20 task perf_event "cycles", to create multiplexing */

   fd = perf_event_open();  /* open task perf_event "cycles" */
   userpage = mmap(fd);     /* use mmap and rdmpc */

   while (true) {
     time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
     time_enabled_read = read(fd).time_enabled;
     if (time_enabled_mmap > time_enabled_read)
         BUG();
   }

Fix this by updating userpage for inactive events in merge_sched_in.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reported-and-tested-by: NLucian Grijincu <lucian@fb.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com

f7925653

25 9月, 2021 2 次提交

mm/debug: sync up latest migrate_reason to migrate_reason_names · 57ed7b43

由 Weizhao Ouyang 提交于 9月 24, 2021

Sync up MR_DEMOTION to migrate_reason_names and add a synch prompt.

Link: https://lkml.kernel.org/r/20210921064553.293905-3-o451686892@gmail.com
Fixes: 26aa2d19 ("mm/migrate: demote pages during reclaim")
Signed-off-by: NWeizhao Ouyang <o451686892@gmail.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57ed7b43

mm: fs: invalidate bh_lrus for only cold path · 243418e3

由 Minchan Kim 提交于 9月 24, 2021

The kernel test robot reported the regression of fio.write_iops[1] with
commit 8cc621d2 ("mm: fs: invalidate BH LRU during page migration").

Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.

This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).

Zhengjun Xing confirmed
 "I test the patch, the regression reduced to -2.9%"

[1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/
[2] 8cc621d2, mm: fs: invalidate BH LRU during page migration

Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Reviewed-by: NChris Goldsworthy <cgoldswo@codeaurora.org>
Tested-by: N"Xing, Zhengjun" <zhengjun.xing@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

243418e3

24 9月, 2021 1 次提交

driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD · 5501765a

由 Saravana Kannan 提交于 9月 15, 2021

If a parent device is also a supplier to a child device, fw_devlink=on by
design delays the probe() of the child device until the probe() of the
parent finishes successfully.

However, some drivers of such parent devices (where parent is also a
supplier) expect the child device to finish probing successfully as soon as
they are added using device_add() and before the probe() of the parent
device has completed successfully. One example of such a case is discussed
in the link mentioned below.

Add a flag to make fw_devlink=on not enforce these supplier-consumer
relationships, so these drivers can continue working.

Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
Fixes: ea718c69 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915170940.617415-3-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5501765a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功