提交 · 64cae2fb48ee10a84b044c516dba1b44d7f4d161 · openeuler / Kernel

07 8月, 2020 1 次提交

bpf: Change uapi for bpf iterator map elements · 5e7b3020

由 Yonghong Song 提交于 8月 04, 2020

Commit a5cbe05a ("bpf: Implement bpf iterator for
map elements") added bpf iterator support for
map elements. The map element bpf iterator requires
info to identify a particular map. In the above
commit, the attr->link_create.target_fd is used
to carry map_fd and an enum bpf_iter_link_info
is added to uapi to specify the target_fd actually
representing a map_fd:
    enum bpf_iter_link_info {
	BPF_ITER_LINK_UNSPEC = 0,
	BPF_ITER_LINK_MAP_FD = 1,

	MAX_BPF_ITER_LINK_INFO,
    };

This is an extensible approach as we can grow
enumerator for pid, cgroup_id, etc. and we can
unionize target_fd for pid, cgroup_id, etc.
But in the future, there are chances that
more complex customization may happen, e.g.,
for tasks, it could be filtered based on
both cgroup_id and user_id.

This patch changed the uapi to have fields
	__aligned_u64	iter_info;
	__u32		iter_info_len;
for additional iter_info for link_create.
The iter_info is defined as
	union bpf_iter_link_info {
		struct {
			__u32   map_fd;
		} map;
	};

So future extension for additional customization
will be easier. The bpf_iter_link_info will be
passed to target callback to validate and generic
bpf_iter framework does not need to deal it any
more.

Note that map_fd = 0 will be considered invalid
and -EBADF will be returned to user space.

Fixes: a5cbe05a ("bpf: Implement bpf iterator for map elements")
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com

5e7b3020

05 8月, 2020 1 次提交

ipv4: route: Ignore output interface in FIB lookup for PMTU route · df23bb18

由 Stefano Brivio 提交于 8月 04, 2020

Currently, processes sending traffic to a local bridge with an
encapsulation device as a port don't get ICMP errors if they exceed
the PMTU of the encapsulated link.

David Ahern suggested this as a hack, but it actually looks like
the correct solution: when we update the PMTU for a given destination
by means of updating or creating a route exception, the encapsulation
might trigger this because of PMTU discovery happening either on the
encapsulation device itself, or its lower layer. This happens on
bridged encapsulations only.

The output interface shouldn't matter, because we already have a
valid destination. Drop the output interface restriction from the
associated route lookup.

For UDP tunnels, we will now have a route exception created for the
encapsulation itself, with a MTU value reflecting its headroom, which
allows a bridge forwarding IP packets originated locally to deliver
errors back to the sending socket.

The behaviour is now consistent with IPv6 and verified with selftests
pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in
this series.

v2:
- reset output interface only for bridge ports (David Ahern)
- add and use netif_is_any_bridge_port() helper (David Ahern)
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df23bb18

04 8月, 2020 6 次提交

random32: move the pseudo-random 32-bit definitions to prandom.h · c0842fbc

由 Linus Torvalds 提交于 7月 31, 2020

The addition of percpu.h to the list of includes in random.h revealed
some circular dependencies on arm64 and possibly other platforms.  This
include was added solely for the pseudo-random definitions, which have
nothing to do with the rest of the definitions in this file but are
still there for legacy reasons.

This patch moves the pseudo-random parts to linux/prandom.h and the
percpu.h include with it, which is now guarded by _LINUX_PRANDOM_H and
protected against recursive inclusion.

A further cleanup step would be to remove this from <linux/random.h>
entirely, and make people who use the prandom infrastructure include
just the new header file.  That's a bit of a churn patch, but grepping
for "prandom_" and "next_pseudo_random32" "struct rnd_state" should
catch most users.

But it turns out that that nice cleanup step is fairly painful, because
a _lot_ of code currently seems to depend on the implicit include of
<linux/random.h>, which can currently come in a lot of ways, including
such fairly core headfers as <linux/net.h>.

So the "nice cleanup" part may or may never happen.

Fixes: 1c9df907 ("random: fix circular include dependency on arm64 after addition of percpu.h")
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Acked-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0842fbc

net: dsa: loop: Wire-up MTU callbacks · c99194ed

由 Florian Fainelli 提交于 8月 03, 2020

For now we simply store the port MTU into a per-port member.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c99194ed

net: dsa: loop: Move data structures to header · 6c84a589

由 Florian Fainelli 提交于 8月 03, 2020

In preparation for adding support for a mockup data path, move the
driver data structures to include/linux/dsa/loop.h such that we can
share them between net/dsa/ and drivers/net/dsa/ later on.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c84a589

gpio: don't use same lockdep class for all devm_gpiochip_add_data users · 5f402bb1

由 Ahmad Fatoum 提交于 7月 31, 2020

Commit 959bc7b2 ("gpio: Automatically add lockdep keys") documents
in its commits message its intention to "create a unique class key for
each driver".

It does so by having gpiochip_add_data add in-place the definition of
two static lockdep classes for LOCKDEP use. That way, every caller of
the macro adds their gpiochip with unique lockdep classes.

There are many indirect callers of gpiochip_add_data, however, via
use of devm_gpiochip_add_data. devm_gpiochip_add_data has external
linkage and all its users will share the same lockdep classes, which
probably is not intended.

Fix this by replicating the gpio_chip_add_data statics-in-macro for
the devm_ version as well.

Fixes: 959bc7b2 ("gpio: Automatically add lockdep keys")
Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20200731123835.8003-1-a.fatoum@pengutronix.deSigned-off-by: NLinus Walleij <linus.walleij@linaro.org>

5f402bb1

net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct · 038ebb1a

由 wenxu 提交于 7月 31, 2020

When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.

"kernel: net2: dropped over-mtu packet: 1528 > 1500"

This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.

Fixes: b57dc7c1 ("net/sched: Introduce action ct")
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

038ebb1a

net: mdiobus: add reset-post-delay-us handling · bb383129

由 Bruno Thomsen 提交于 7月 30, 2020

Load new "reset-post-delay-us" value from MDIO properties,
and if configured to a greater then zero delay do a
flexible sleeping delay after MDIO bus reset deassert.
This allows devices to exit reset state before start
bus communication.
Signed-off-by: NBruno Thomsen <bruno.thomsen@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb383129

03 8月, 2020 2 次提交

mac80211: Handle special status codes in SAE commit · 4e56cde1

由 Jouni Malinen 提交于 7月 31, 2020

SAE authentication has been extended with H2E (IEEE 802.11 REVmd) and PK
(WFA) options. Those extensions use special status code values in the
SAE commit messages (Authentication frame with transaction sequence
number 1) to identify which extension is in use. mac80211 was
interpreting those new values as the AP denying authentication and that
resulted in failure to complete SAE authentication in some cases.

Fix this by adding exceptions for the new status code values 126 and
127.
Signed-off-by: NJouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200731183830.18735-1-jouni@codeaurora.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

4e56cde1

L
list: add "list_del_init_careful()" to go with "list_empty_careful()" · c6fe44d9
由 Linus Torvalds 提交于 7月 23, 2020
```
That gives us ordering guarantees around the pair.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
```
c6fe44d9

02 8月, 2020 3 次提交

wilc1000: Move wilc1000 SDIO ID's from driver source to common header file · c83e2a6e

由 Ajay Singh 提交于 7月 17, 2020

Moved macros used for Vendor/Device ID from wilc1000 driver to common
header file and changed macro name for consistency with other macros.
Signed-off-by: NAjay Singh <ajay.kathat@microchip.com>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NPali Rohár <pali@kernel.org>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200717051134.19160-1-ajay.kathat@microchip.com

c83e2a6e

bpf: Add support for forced LINK_DETACH command · 73b11c2a

由 Andrii Nakryiko 提交于 7月 31, 2020

Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.

This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com

73b11c2a

fs: optimise kiocb_set_rw_flags() · 1752f0ad

由 Pavel Begunkov 提交于 8月 01, 2020

Use a local var to collect flags in kiocb_set_rw_flags(). That spares
some memory writes and allows to replace most of the jumps with MOVEcc.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1752f0ad

01 8月, 2020 3 次提交

sched: Document arch_scale_*_capacity() · f4470cdf

由 Valentin Schneider 提交于 7月 31, 2020

Rather that hide their purpose in some dark, damp corner of Documentation/,
add some documentation to the default implementations.
Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200731192016.7484-2-valentin.schneider@arm.com

f4470cdf

rtnetlink: add support for protodown reason · 829eb208

由 Roopa Prabhu 提交于 7月 31, 2020

netdev protodown is a mechanism that allows protocols to
hold an interface down. It was initially introduced in
the kernel to hold links down by a multihoming protocol.
There was also an attempt to introduce protodown
reason at the time but was rejected. protodown and protodown reason
is supported by almost every switching and routing platform.
It was ok for a while to live without a protodown reason.
But, its become more critical now given more than
one protocol may need to keep a link down on a system
at the same time. eg: vrrp peer node, port security,
multihoming protocol. Its common for Network operators and
protocol developers to look for such a reason on a networking
box (Its also known as errDisable by most networking operators)

This patch adds support for link protodown reason
attribute. There are two ways to maintain protodown
reasons.
(a) enumerate every possible reason code in kernel
    - A protocol developer has to make a request and
      have that appear in a certain kernel version
(b) provide the bits in the kernel, and allow user-space
(sysadmin or NOS distributions) to manage the bit-to-reasonname
map.
	- This makes extending reason codes easier (kind of like
      the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)

This patch takes approach (b).

a few things about the patch:
- It treats the protodown reason bits as counter to indicate
active protodown users
- Since protodown attribute is already an exposed UAPI,
the reason is not enforced on a protodown set. Its a no-op
if not used.
the patch follows the below algorithm:
  - presence of reason bits set indicates protodown
    is in use
  - user can set protodown and protodown reason in a
    single or multiple setlink operations
  - setlink operation to clear protodown, will return -EBUSY
    if there are active protodown reason bits
  - reason is not included in link dumps if not used

example with patched iproute2:
$cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity

$ip link set dev vxlan0 protodown on protodown_reason vrrp on
$ip link set dev vxlan0 protodown_reason mlag on
$ip link show
14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>

$ip link set dev vxlan0 protodown_reason mlag off
$ip link set dev vxlan0 protodown off protodown_reason vrrp off
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

829eb208

tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS · 48040793

由 Yousuk Seung 提交于 7月 30, 2020

This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
the earliest departure time(EDT) of the timestamped skb. By tracking EDT
values of the skb from different timestamps, we can observe when and how
much the value changed. This allows to measure the precise delay
injected on the sender host e.g. by a bpf-base throttler.
Signed-off-by: NYousuk Seung <ysseung@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48040793

31 7月, 2020 7 次提交

kcsan: Improve IRQ state trace reporting · 92c209ac

由 Marco Elver 提交于 7月 29, 2020

To improve the general usefulness of the IRQ state trace events with
KCSAN enabled, save and restore the trace information when entering and
exiting the KCSAN runtime as well as when generating a KCSAN report.

Without this, reporting the IRQ trace events (whether via a KCSAN report
or outside of KCSAN via a lockdep report) is rather useless due to
continuously being touched by KCSAN. This is because if KCSAN is
enabled, every instrumented memory access causes changes to IRQ trace
events (either by KCSAN disabling/enabling interrupts or taking
report_lock when generating a report).

Before "lockdep: Prepare for NMI IRQ state tracking", KCSAN avoided
touching the IRQ trace events via raw_local_irq_save/restore() and
lockdep_off/on().

Fixes: 248591f5 ("kcsan: Make KCSAN compatible with new IRQ state tracking")
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200729110916.3920464-2-elver@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

92c209ac

lockdep: Refactor IRQ trace events fields into struct · 0584df9c

由 Marco Elver 提交于 7月 29, 2020

Refactor the IRQ trace events fields, used for printing information
about the IRQ trace events, into a separate struct 'irqtrace_events'.

This improves readability by separating the information only used in
reporting, as well as enables (simplified) storing/restoring of
irqtrace_events snapshots.

No functional change intended.
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200729110916.3920464-1-elver@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0584df9c

lib: Add zstd support to decompress · 4963bb2b

由 Nick Terrell 提交于 7月 30, 2020

- Add unzstd() and the zstd decompress interface.

- Add zstd support to decompress_method().

The decompress_method() and unzstd() functions are used to decompress
the initramfs and the initrd. The __decompress() function is used in
the preboot environment to decompress a zstd compressed kernel.

The zstd decompression function allows the input and output buffers to
overlap because that is used by x86 kernel decompression.
Signed-off-by: NNick Terrell <terrelln@fb.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-3-nickrterrell@gmail.com

4963bb2b

lib/mpi: Add mpi_sub_ui() · 4278e9d9

由 Marcelo Henrique Cerri 提交于 7月 20, 2020

Add mpi_sub_ui() based on Gnu MP mpz_sub_ui() function from file
mpz/aors_ui.h[1] from change id 510b83519d1c adapting the code to the
kernel's data structures, helper functions and coding style and also
removing the defines used to produce mpz_sub_ui() and mpz_add_ui()
from the same code.

[1] https://gmplib.org/repo/gmp-6.2/file/510b83519d1c/mpz/aors.hSigned-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: NStephan Mueller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

4278e9d9

tasklet: Introduce new initialization API · 12cc923f

由 Romain Perier 提交于 9月 29, 2019

Nowadays, modern kernel subsystems that use callbacks pass the data
structure associated with a given callback as argument to the callback.
The tasklet subsystem remains one which passes an arbitrary unsigned
long to the callback function. This has several problems:

- This keeps an extra field for storing the argument in each tasklet
  data structure, it bloats the tasklet_struct structure with a redundant
  .data field

- No type checking can be performed on this argument. Instead of
  using container_of() like other callback subsystems, it forces callbacks
  to do explicit type cast of the unsigned long argument into the required
  object type.

- Buffer overflows can overwrite the .func and the .data field, so
  an attacker can easily overwrite the function and its first argument
  to whatever it wants.

Add a new tasklet initialization API, via DECLARE_TASKLET() and
tasklet_setup(), which will replace the existing ones.

This work is greatly inspired by the timer_struct conversion series,
see commit e99e88a9 ("treewide: setup_timer() -> timer_setup()")

To avoid problems with both -Wcast-function-type (which is enabled in
the kernel via -Wextra is several subsystems), and with mismatched
function prototypes when build with Control Flow Integrity enabled,
this adds the "use_callback" member to let the tasklet caller choose
which union member to call through. Once all old API uses are removed,
this and the .data member will be removed as well. (On 64-bit this does
not grow the struct size as the new member fills the hole after atomic_t,
which is also "int" sized.)
Signed-off-by: NRomain Perier <romain.perier@gmail.com>
Co-developed-by: NAllen Pais <allen.lkml@gmail.com>
Signed-off-by: NAllen Pais <allen.lkml@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Co-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

12cc923f

treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() · b13fecb1

由 Kees Cook 提交于 7月 13, 2020

This converts all the existing DECLARE_TASKLET() (and ...DISABLED)
macros with DECLARE_TASKLET_OLD() in preparation for refactoring the
tasklet callback type. All existing DECLARE_TASKLET() users had a "0"
data argument, it has been removed here as well.
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NKees Cook <keescook@chromium.org>

b13fecb1

random: fix circular include dependency on arm64 after addition of percpu.h · 1c9df907

由 Willy Tarreau 提交于 7月 30, 2020

Daniel Díaz and Kees Cook independently reported that commit
f227e3ec ("random32: update the net random state on interrupt and
activity") broke arm64 due to a circular dependency on include files
since the addition of percpu.h in random.h.

The correct fix would definitely be to move all the prandom32 stuff out
of random.h but for backporting, a smaller solution is preferred.

This one replaces linux/percpu.h with asm/percpu.h, and this fixes the
problem on x86_64, arm64, arm, and mips.  Note that moving percpu.h
around didn't change anything and that removing it entirely broke
differently.  When backporting, such options might still be considered
if this patch fails to help.

[ It turns out that an alternate fix seems to be to just remove the
  troublesome <asm/pointer_auth.h> remove from the arm64 <asm/smp.h>
  that causes the circular dependency.

  But we might as well do the whole belt-and-suspenders thing, and
  minimize inclusion in <linux/random.h> too. Either will fix the
  problem, and both are good changes.   - Linus ]
Reported-by: NDaniel Díaz <daniel.diaz@linaro.org>
Reported-by: NKees Cook <keescook@chromium.org>
Tested-by: NMarc Zyngier <maz@kernel.org>
Fixes: f227e3ec
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c9df907

30 7月, 2020 5 次提交

PM / devfreq: Add support delayed timer for polling mode · 4dc3bab8

由 Chanwoo Choi 提交于 7月 02, 2020

Until now, the devfreq driver using polling mode like simple_ondemand
governor have used only deferrable timer for reducing the redundant
power consumption. It reduces the CPU wake-up from idle due to polling mode
which check the status of Non-CPU device.

But, it has a problem for Non-CPU device like DMC device with DMA operation.
Some Non-CPU device need to do monitor continuously regardless of CPU state
in order to decide the proper next status of Non-CPU device.

So, add support the delayed timer for polling mode to support
the repetitive monitoring. The devfreq driver and user can select
the kind of timer on either deferrable and delayed timer.

For example, change the timer type of DMC device
based on Exynos5422-based Odroid-XU3 as following:

- If want to use deferrable timer as following:
echo deferrable > /sys/class/devfreq/10c20000.memory-controller/timer

- If want to use delayed timer as following:
echo delayed > /sys/class/devfreq/10c20000.memory-controller/timer
Reviewed-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: NLukasz Luba <lukasz.luba@arm.com>
Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>

4dc3bab8

driver core: add device probe log helper · a787e540

由 Andrzej Hajda 提交于 7月 13, 2020

During probe every time driver gets resource it should usually check for
error printk some message if it is not -EPROBE_DEFER and return the error.
This pattern is simple but requires adding few lines after any resource
acquisition code, as a result it is often omitted or implemented only
partially.
dev_err_probe helps to replace such code sequences with simple call,
so code:
	if (err != -EPROBE_DEFER)
		dev_err(dev, ...);
	return err;
becomes:
	return dev_err_probe(dev, err, ...);
Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Reviewed-by: NRafael J. Wysocki <rafael@kernel.org>
Reviewed-by: NMark Brown <broonie@kernel.org>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20200713144324.23654-2-a.hajda@samsung.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a787e540

random32: remove net_rand_state from the latent entropy gcc plugin · 83bdc727

由 Linus Torvalds 提交于 7月 29, 2020

It turns out that the plugin right now ends up being really unhappy
about the change from 'static' to 'extern' storage that happened in
commit f227e3ec ("random32: update the net random state on interrupt
and activity").

This is probably a trivial fix for the latent_entropy plugin, but for
now, just remove net_rand_state from the list of things the plugin
worries about.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83bdc727

random32: update the net random state on interrupt and activity · f227e3ec

由 Willy Tarreau 提交于 7月 10, 2020

This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.

Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.

In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state.  For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.
Reported-by: NAmit Klein <aksecurity@gmail.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f227e3ec

cpuidle: change enter_s2idle() prototype · efe97112

由 Neal Liu 提交于 7月 27, 2020

Control Flow Integrity(CFI) is a security mechanism that disallows
changes to the original control flow graph of a compiled binary,
making it significantly harder to perform such attacks.

init_state_node() assign same function callback to different
function pointer declarations.

static int init_state_node(struct cpuidle_state *idle_state,
                           const struct of_device_id *matches,
                           struct device_node *state_node) { ...
        idle_state->enter = match_id->data; ...
        idle_state->enter_s2idle = match_id->data; }

Function declarations:

struct cpuidle_state { ...
        int (*enter) (struct cpuidle_device *dev,
                      struct cpuidle_driver *drv,
                      int index);

        void (*enter_s2idle) (struct cpuidle_device *dev,
                              struct cpuidle_driver *drv,
                              int index); };

In this case, either enter() or enter_s2idle() would cause CFI check
failed since they use same callee.

Align function prototype of enter() since it needs return value for
some use cases. The return value of enter_s2idle() is no
need currently.
Signed-off-by: NNeal Liu <neal.liu@mediatek.com>
Reviewed-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

efe97112

29 7月, 2020 12 次提交

nvmem: core: add support to auto devid · 731aa3fa

由 Srinivas Kandagatla 提交于 7月 22, 2020

For nvmem providers which have multiple instances, it is required
to suffix the provider name with proper id, so that they do not
confict for the same name. Currently the core does not handle
this case properly eventhough core already has logic to generate the id.

This patch add new devid type NVMEM_DEVID_AUTO for providers to be
able to allow core to assign id and append it to provier name.
Reported-by: NShawn Guo <shawn.guo@linaro.org>
Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: NShawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20200722100705.7772-8-srinivas.kandagatla@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

731aa3fa

nvmem: core: Add nvmem_cell_read_u8() · 5037d368

由 Andreas Färber 提交于 7月 22, 2020

Complement the u16, u32 and u64 helpers with a u8 variant to ease
accessing byte-sized values.

This helper will be useful for Realtek Digital Home Center platforms,
which store some byte and sub-byte sized values in non-volatile memory.
Signed-off-by: NAndreas Färber <afaerber@suse.de>
Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200722100705.7772-7-srinivas.kandagatla@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5037d368

seqlock: lockdep assert non-preemptibility on seqcount_t write · 859247d3

由 Ahmed S. Darwish 提交于 7月 20, 2020

Preemption must be disabled before entering a sequence count write side
critical section. Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick. If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Assert through lockdep that preemption is disabled for seqcount writers.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-9-a.darwish@linutronix.de

859247d3

lockdep: Add preemption enabled/disabled assertion APIs · 8fd8ad5c

由 Ahmed S. Darwish 提交于 7月 20, 2020

Asserting that preemption is enabled or disabled is a critical sanity
check. Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.

Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.

References: f54bb2ec ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-8-a.darwish@linutronix.de

8fd8ad5c

seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() · 932e4636

由 Ahmed S. Darwish 提交于 7月 20, 2020

raw_seqcount_begin() has the same code as raw_read_seqcount(), with the
exception of masking the sequence counter's LSB before returning it to
the caller.

Note, raw_seqcount_begin() masks the counter's LSB before returning it
to the caller so that read_seqcount_retry() can fail if the counter is
odd -- without the overhead of an extra branching instruction.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-7-a.darwish@linutronix.de

932e4636

seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs · 89b88845

由 Ahmed S. Darwish 提交于 7月 20, 2020

seqlock.h is now included by kernel's RST documentation, but a small
number of the the exported seqlock.h functions are kernel-doc annotated.

Add kernel-doc for all seqlock.h exported APIs.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-6-a.darwish@linutronix.de

89b88845

seqlock: Reorder seqcount_t and seqlock_t API definitions · f4a27cbc

由 Ahmed S. Darwish 提交于 7月 20, 2020

The seqlock.h seqcount_t and seqlock_t API definitions are presented in
the chronological order of their development rather than the order that
makes most sense to readers. This makes it hard to follow and understand
the header file code.

Group and reorder all of the exported seqlock.h functions according to
their function.

First, group together the seqcount_t standard read path functions:

    - __read_seqcount_begin()
    - raw_read_seqcount_begin()
    - read_seqcount_begin()

since each function is implemented exactly in terms of the one above
it. Then, group the special-case seqcount_t readers on their own as:

    - raw_read_seqcount()
    - raw_seqcount_begin()

since the only difference between the two functions is that the second
one masks the sequence counter LSB while the first one does not. Note
that raw_seqcount_begin() can actually be implemented in terms of
raw_read_seqcount(), which will be done in a follow-up commit.

Then, group the seqcount_t write path functions, instead of injecting
unrelated seqcount_t latch functions between them, and order them as:

    - raw_write_seqcount_begin()
    - raw_write_seqcount_end()
    - write_seqcount_begin_nested()
    - write_seqcount_begin()
    - write_seqcount_end()
    - raw_write_seqcount_barrier()
    - write_seqcount_invalidate()

which is the expected natural order. This also isolates the seqcount_t
latch functions into their own area, at the end of the sequence counters
section, and before jumping to the next one: sequential locks
(seqlock_t).

Do a similar grouping and reordering for seqlock_t "locking" readers vs.
the "conditionally locking or lockless" ones.

No implementation code was changed in any of the reordering above.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-5-a.darwish@linutronix.de

f4a27cbc

seqlock: seqcount_t latch: End read sections with read_seqcount_retry() · d3b35b87

由 Ahmed S. Darwish 提交于 7月 20, 2020

The seqcount_t latch reader example at the raw_write_seqcount_latch()
kernel-doc comment ends the latch read section with a manual smp memory
barrier and sequence counter comparison.

This is technically correct, but it is suboptimal: read_seqcount_retry()
already contains the same logic of an smp memory barrier and sequence
counter comparison.

End the latch read critical section example with read_seqcount_retry().
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-4-a.darwish@linutronix.de

d3b35b87

seqlock: Properly format kernel-doc code samples · 15cbe67b

由 Ahmed S. Darwish 提交于 7月 20, 2020

Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-3-a.darwish@linutronix.de

15cbe67b

Documentation: locking: Describe seqlock design and usage · 0d24f65e

由 Ahmed S. Darwish 提交于 7月 20, 2020

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
    description for both mechanisms was intermingled, which is incorrect
    since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
    rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
    not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
    and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
    serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
    kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
    gcc version is 4.9.

References: 0f6ed63b ("no need to keep brlock macros anymore...")
References: 6ec4476a ("Raise gcc version requirement to 4.9")
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de

0d24f65e

locking/atomic: Move ATOMIC_INIT into linux/types.h · 7ca8cf53

由 Herbert Xu 提交于 7月 29, 2020

This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWaiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au

7ca8cf53

sched/uclamp: Add a new sysctl to control RT default boost value · 13685c4a

由 Qais Yousef 提交于 7月 16, 2020

RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.

This is also referred to as 'the default boost value of RT tasks'.

See commit 1a00d999 ("sched/uclamp: Set default clamps for RT tasks").

On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.

For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.

Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.

They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.

The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.

Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.

To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.

I anticipate this to be mostly in the form of modifying the init script
of a particular device.

To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.

Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.

With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.

[1] 804d402f: ("sched/rt: Make RT capacity-aware")
Signed-off-by: NQais Yousef <qais.yousef@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com

13685c4a

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功