提交 · 2ebc499a0f4f7984b50b27044a337409df9292b3 · openeuler / Kernel

24 10月, 2019 2 次提交

net: phy: broadcom: add 1000Base-X support for BCM54616S · b9bcb953

由 Tao Ren 提交于 10月 22, 2019

The BCM54616S PHY cannot work properly in RGMII->1000Base-X mode, mainly
because genphy functions are designed for copper links, and 1000Base-X
(clause 37) auto negotiation needs to be handled differently.

This patch enables 1000Base-X support for BCM54616S by customizing 3
driver callbacks, and it's verified to be working on Facebook CMM BMC
platform (RGMII->1000Base-KX):

  - probe: probe callback detects PHY's operation mode based on
    INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
    Control register.

  - config_aneg: calls genphy_c37_config_aneg when the PHY is running in
    1000Base-X mode; otherwise, genphy_config_aneg will be called.

  - read_status: calls genphy_c37_read_status when the PHY is running in
    1000Base-X mode; otherwise, genphy_read_status will be called.

Note: BCM54616S PHY can also be configured in RGMII->100Base-FX mode, and
100Base-FX support is not available as of now.
Signed-off-by: NTao Ren <taoren@fb.com>
Acked-by: NVladimir Oltean <olteanv@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9bcb953

net: phy: add support for clause 37 auto-negotiation · fa6e98ce

由 Heiner Kallweit 提交于 10月 22, 2019

This patch adds support for clause 37 1000Base-X auto-negotiation.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NTao Ren <taoren@fb.com>
Tested-by: NRené van Dorst <opensource@vdorst.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa6e98ce

18 10月, 2019 1 次提交

net: phy: micrel: Update KSZ87xx PHY name · 1d951ba3

由 Marek Vasut 提交于 10月 16, 2019

The KSZ8795 PHY ID is in fact used by KSZ8794/KSZ8795/KSZ8765 switches.
Update the PHY ID and name to reflect that, as this family of switches
is commonly refered to as KSZ87xx
Signed-off-by: NMarek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d951ba3

17 10月, 2019 2 次提交

net: sfp: move fwnode parsing into sfp-bus layer · 2203cbf2

由 Russell King 提交于 10月 15, 2019

Rather than parsing the sfp firmware node in phylink, parse it in the
sfp-bus code, so we can re-use this code for PHYs without having to
duplicate the parsing.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2203cbf2

arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled · 19c95f26

由 Julien Thierry 提交于 10月 15, 2019

Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.

However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.

* Task 1:

PAN == 0 ---|                          |---------------
            |                          |<- return from IRQ, PSTATE.PAN = 0
            | <- IRQ                   |
            +--------+ <- preempt()  +--
                                     ^
                                     |
                                     reschedule Task 1, PSTATE.PAN == 1
* Init:
        --------------------+------------------------
                            ^
                            |
                            enable_cpu_features
                            set PSTATE.PAN on all CPUs

Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).

Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.

This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
 expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>

19c95f26

16 10月, 2019 2 次提交

net: phylink: use more linkmode_* · 554032cd

由 Russell King 提交于 10月 15, 2019

Use more linkmode_* helpers rather than open-coding the bitmap
operations.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

554032cd

net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions · fa4e0f88

由 Davide Caratti 提交于 10月 12, 2019

the following script:

 # tc qdisc add dev eth0 clsact
 # tc filter add dev eth0 egress protocol ip matchall \
 > action mpls push protocol mpls_uc label 0x355aa bos 1

causes corruption of all IP packets transmitted by eth0. On TC egress, we
can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
operation will result in an overwrite of the first 4 octets in the packet
L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
error pattern is present also in the MPLS 'pop' operation. Fix this error
in act_mpls data plane, computing 'mac_len' as the difference between the
network header and the mac header (when not at TC ingress), and use it in
MPLS 'push'/'pop' core functions.

v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
    in skb_mpls_pop(), reported by kbuild test robot

CC: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa4e0f88

15 10月, 2019 5 次提交

gpiolib: Initialize the hardware with a callback · 9411e3aa

由 Andy Shevchenko 提交于 10月 09, 2019

After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.

Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.

This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

9411e3aa

xarray.h: fix kernel-doc warning · 13bea898

由 Randy Dunlap 提交于 10月 14, 2019

Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:

  include/linux/xarray.h:232: WARNING: Unexpected indentation.

Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f9 ("XArray: Redesign xa_alloc API")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13bea898

bitmap.h: fix kernel-doc warning and typo · 2a7e582f

由 Randy Dunlap 提交于 10月 14, 2019

Fix kernel-doc warning in <linux/bitmap.h>:

  include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'

Also fix small typo (bitnaps).

Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
Fixes: b9fa6442 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a7e582f

mm, page_owner: rename flag indicating that page is allocated · fdf3bf80

由 Vlastimil Babka 提交于 10月 14, 2019

Commit 37389167 ("mm, page_owner: keep owner info when freeing the
page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
is tracked as being allocated.  Kirril suggested naming it
PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
loaded term for a page".

Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fdf3bf80

mm, page_owner: fix off-by-one error in __set_page_owner_handle() · 5556cfe8

由 Vlastimil Babka 提交于 10月 14, 2019

Patch series "followups to debug_pagealloc improvements through
page_owner", v3.

These are followups to [1] which made it to Linus meanwhile.  Patches 1
and 3 are based on Kirill's review, patch 2 on KASAN request [2].  It
would be nice if all of this made it to 5.4 with [1] already there (or
at least Patch 1).

This patch (of 3):

As noted by Kirill, commit 7e2f2a0c ("mm, page_owner: record page
owner for each subpage") has introduced an off-by-one error in
__set_page_owner_handle() when looking up page_ext for subpages.  As a
result, the head page page_owner info is set twice, while for the last
tail page, it's not set at all.

Fix this and also make the code more efficient by advancing the page_ext
pointer we already have, instead of calling lookup_page_ext() for each
subpage.  Since the full size of struct page_ext is not known at compile
time, we can't use a simple page_ext++ statement, so introduce a
page_ext_next() inline function for that.

Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
Fixes: 7e2f2a0c ("mm, page_owner: record page owner for each subpage")
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
Reported-by: NMiles Chen <miles.chen@mediatek.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5556cfe8

14 10月, 2019 1 次提交

tcp: add rcu protection around tp->fastopen_rsk · d983ea6f

由 Eric Dumazet 提交于 10月 10, 2019

Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :

	fastopen = tp->fastopen_rsk;
 	snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;

The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.

request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.

Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d983ea6f

12 10月, 2019 2 次提交

bpf: Align struct bpf_prog_stats · 84a081f6

由 Eric Dumazet 提交于 10月 11, 2019

Do not risk spanning these small structures on two cache lines.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191011181140.2898-1-edumazet@google.com

84a081f6

compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case use · 294f69e6

由 Joe Perches 提交于 10月 05, 2019

Reserve the pseudo keyword 'fallthrough' for the ability to convert the
various case block /* fallthrough */ style comments to appear to be an
actual reserved word with the same gcc case block missing fallthrough
warning capability.

All switch/case blocks now should end in one of:

	break;
	fallthrough;
	goto <label>;
	return [expression];
	continue;

In C mode, GCC supports the __fallthrough__ attribute since 7.1,
the same time the warning and the comment parsing were introduced.

fallthrough devolves to an empty "do {} while (0)" if the compiler
version (any version less than gcc 7) does not support the attribute.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

294f69e6

11 10月, 2019 1 次提交

SUNRPC: fix race to sk_err after xs_error_report · af84537d

由 Benjamin Coddington 提交于 10月 02, 2019

Since commit 4f8943f8 ("SUNRPC: Replace direct task wakeups from
softirq context") there has been a race to the value of the sk_err if both
XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set.  In that case,
we may end up losing the sk_err value that existed when xs_error_report was
called.

Fix this by reverting to the previous behavior: instead of using SO_ERROR
to retrieve the value at a later time (which might also return sk_err_soft),
copy the sk_err value onto struct sock_xprt, and use that value to wake
pending tasks.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Fixes: 4f8943f8 ("SUNRPC: Replace direct task wakeups from softirq context")
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

af84537d

10 10月, 2019 1 次提交

DIM: fix dim.h kernel-doc and headers · 690a6ca7

由 Randy Dunlap 提交于 10月 08, 2019

Lots of fixes to kernel-doc in structs, enums, and functions.
Also add header files that are being used but not yet #included.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Yamin Friedman <yaminf@mellanox.com>
Cc: Tal Gilboa <talgi@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

690a6ca7

09 10月, 2019 3 次提交

Revert "tun: call dev_get_valid_name() before register_netdevice()" · bacb7e18

由 Eric Dumazet 提交于 10月 08, 2019

This reverts commit 0ad646c8.

As noticed by Jakub, this is no longer needed after
commit 11fc7d5a ("tun: fix memory leak in error path")

This no longer exports dev_get_valid_name() for the exclusive
use of tun driver.
Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

bacb7e18

leds: core: Fix leds.h structure documentation · e3f12714

由 Dan Murphy 提交于 10月 02, 2019

Update the leds.h structure documentation to define the
correct arguments.
Signed-off-by: NDan Murphy <dmurphy@ti.com>
Signed-off-by: NJacek Anaszewski <jacek.anaszewski@gmail.com>

e3f12714

spi: Add a PTP system timestamp to the transfer structure · 79591b7d

由 Vladimir Oltean 提交于 9月 05, 2019

SPI is one of the interfaces used to access devices which have a POSIX
clock driver (real time clocks, 1588 timers etc). The fact that the SPI
bus is slow is not what the main problem is, but rather the fact that
drivers don't take a constant amount of time in transferring data over
SPI. When there is a high delay in the readout of time, there will be
uncertainty in the value that has been read out of the peripheral.
When that delay is constant, the uncertainty can at least be
approximated with a certain accuracy which is fine more often than not.

Timing jitter occurs all over in the kernel code, and is mainly caused
by having to let go of the CPU for various reasons such as preemption,
servicing interrupts, going to sleep, etc. Another major reason is CPU
dynamic frequency scaling.

It turns out that the problem of retrieving time from a SPI peripheral
with high accuracy can be solved by the use of "PTP system
timestamping" - a mechanism to correlate the time when the device has
snapshotted its internal time counter with the Linux system time at that
same moment. This is sufficient for having a precise time measurement -
it is not necessary for the whole SPI transfer to be transmitted "as
fast as possible", or "as low-jitter as possible". The system has to be
low-jitter for a very short amount of time to be effective.

This patch introduces a PTP system timestamping mechanism in struct
spi_transfer. This is to be used by SPI device drivers when they need to
know the exact time at which the underlying device's time was
snapshotted. More often than not, SPI peripherals have a very exact
timing for when their SPI-to-interconnect bridge issues a transaction
for snapshotting and reading the time register, and that will be
dependent on when the SPI-to-interconnect bridge figures out that this
is what it should do, aka as soon as it sees byte N of the SPI transfer.
Since spi_device drivers are the ones who'd know best how the peripheral
behaves in this regard, expose a mechanism in spi_transfer which allows
them to specify which word (or word range) from the transfer should be
timestamped.

Add a default implementation of the PTP system timestamping in the SPI
core. This is not going to be satisfactory performance-wise, but should
at least increase the likelihood that SPI device drivers will use PTP
system timestamping in the future.
There are 3 entry points from the core towards the SPI controller
drivers:

- transfer_one: The driver is passed individual spi_transfers to
execute. This is the easiest to timestamp.

- transfer_one_message: The core passes the driver an entire spi_message
(a potential batch of spi_transfers). The core puts the same pre and
post timestamp to all transfers within a message. This is not ideal,
but nothing better can be done by default anyway, since the core has
no insight into how the driver batches the transfers.

- transfer: Like transfer_one_message, but for unqueued drivers (i.e.
the driver implements its own queue scheduling).
Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-3-olteanv@gmail.comSigned-off-by: NMark Brown <broonie@kernel.org>

79591b7d

08 10月, 2019 9 次提交

lib/string: Make memzero_explicit() inline instead of external · bec50077

由 Arvind Sankar 提交于 10月 07, 2019

With the use of the barrier implied by barrier_data(), there is no need
for memzero_explicit() to be extern. Making it inline saves the overhead
of a function call, and allows the code to be reused in arch/*/purgatory
without having to duplicate the implementation.
Tested-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: NHans de Goede <hdegoede@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Fixes: 906a4bb9 ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit")
Link: https://lkml.kernel.org/r/20191007220000.GA408752@rani.riverdale.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

bec50077

mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) · 59bb4798

由 Vlastimil Babka 提交于 10月 06, 2019

In most configurations, kmalloc() happens to return naturally aligned
(i.e.  aligned to the block size itself) blocks for power of two sizes.

That means some kmalloc() users might unknowingly rely on that
alignment, until stuff breaks when the kernel is built with e.g.
CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned.  Then
developers have to devise workaround such as own kmem caches with
specified alignment [1], which is not always practical, as recently
evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3].  Adding a
'kmalloc_aligned()' variant would not help with code unknowingly relying
on the implicit alignment.  For slab implementations it would either
require creating more kmalloc caches, or allocate a larger size and only
give back part of it.  That would be wasteful, especially with a generic
alignment parameter (in contrast with a fixed alignment to size).

Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc()
alignment to size explicitly guaranteed for power-of-two sizes under all
configurations.  What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch.  The
  implicitly provided alignment could be compromised with
  CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
  caches with alignment larger than unsigned long long.  Practically on at
  least x86 this includes kmalloc caches as they use cache line alignment,
  which is larger than that.  Still, this patch ensures alignment on all
  arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through
  CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
  With this patch, explicit alignment is guaranteed with redzoning as
  well.  This will result in more memory being wasted, but that should be
  acceptable in a debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for
  kmalloc().  The potential downside is increased fragmentation.  While
  pathological allocation scenarios are certainly possible, in my testing,
  after booting a x86_64 kernel+userspace with virtme, around 16MB memory
  was consumed by slab pages both before and after the patch, with
  difference in the noise.

[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/

[akpm@linux-foundation.org: documentation fixlet, per Matthew]
Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

59bb4798

mm, memcg: make scan aggression always exclude protection · 1bc63fb1

由 Chris Down 提交于 10月 06, 2019

This patch is an incremental improvement on the existing
memory.{low,min} relative reclaim work to base its scan pressure
calculations on how much protection is available compared to the current
usage, rather than how much the current usage is over some protection
threshold.

This change doesn't change the experience for the user in the normal
case too much. One benefit is that it replaces the (somewhat arbitrary)
100% cutoff with an indefinite slope, which makes it easier to ballpark
a memory.low value.

As well as this, the old methodology doesn't quite apply generically to
machines with varying amounts of physical memory. Let's say we have a
top level cgroup, workload.slice, and another top level cgroup,
system-management.slice. We want to roughly give 12G to
system-management.slice, so on a 32GB machine we set memory.low to 20GB
in workload.slice, and on a 64GB machine we set memory.low to 52GB.
However, because these are relative amounts to the total machine size,
while the amount of memory we want to generally be willing to yield to
system.slice is absolute (12G), we end up putting more pressure on
system.slice just because we have a larger machine and a larger workload
to fill it, which seems fairly unintuitive. With this new behaviour, we
don't end up with this unintended side effect.

Previously the way that memory.low protection works is that if you are
50% over a certain baseline, you get 50% of your normal scan pressure.
This is certainly better than the previous cliff-edge behaviour, but it
can be improved even further by always considering memory under the
currently enforced protection threshold to be out of bounds. This means
that we can set relatively low memory.low thresholds for variable or
bursty workloads while still getting a reasonable level of protection,
whereas with the previous version we may still trivially hit the 100%
clamp. The previous 100% clamp is also somewhat arbitrary, whereas this
one is more concretely based on the currently enforced protection
threshold, which is likely easier to reason about.

There is also a subtle issue with the way that proportional reclaim
worked previously -- it promotes having no memory.low, since it makes
pressure higher during low reclaim. This happens because we base our
scan pressure modulation on how far memory.current is between memory.min
and memory.low, but if memory.low is unset, we only use the overage
method. In most cromulent configurations, this then means that we end
up with *more* pressure than with no memory.low at all when we're in low
reclaim, which is not really very usable or expected.

With this patch, memory.low and memory.min affect reclaim pressure in a
more understandable and composable way. For example, from a user
standpoint, "protected" memory now remains untouchable from a reclaim
aggression standpoint, and users can also have more confidence that
bursty workloads will still receive some amount of guaranteed
protection.

Link: http://lkml.kernel.org/r/20190322160307.GA3316@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1bc63fb1

mm, memcg: make memory.emin the baseline for utilisation determination · 9de7ca46

由 Chris Down 提交于 10月 06, 2019

Roman points out that when when we do the low reclaim pass, we scale the
reclaim pressure relative to position between 0 and the maximum
protection threshold.

However, if the maximum protection is based on memory.elow, and
memory.emin is above zero, this means we still may get binary behaviour
on second-pass low reclaim.  This is because we scale starting at 0, not
starting at memory.emin, and since we don't scan at all below emin, we
end up with cliff behaviour.

This should be a fairly uncommon case since usually we don't go into the
second pass, but it makes sense to scale our low reclaim pressure
starting at emin.

You can test this by catting two large sparse files, one in a cgroup
with emin set to some moderate size compared to physical RAM, and
another cgroup without any emin.  In both cgroups, set an elow larger
than 50% of physical RAM.  The one with emin will have less page
scanning, as reclaim pressure is lower.

Rebase on top of and apply the same idea as what was applied to handle
cgroup_memory=disable properly for the original proportional patch
http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name ("mm,
memcg: Handle cgroup_disable=memory when getting memcg protection").

Link: http://lkml.kernel.org/r/20190201051810.GA18895@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
Suggested-by: NRoman Gushchin <guro@fb.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9de7ca46

mm, memcg: proportional memory.{low,min} reclaim · 9783aa99

由 Chris Down 提交于 10月 06, 2019

cgroup v2 introduces two memory protection thresholds: memory.low
(best-effort) and memory.min (hard protection).  While they generally do
what they say on the tin, there is a limitation in their implementation
that makes them difficult to use effectively: that cliff behaviour often
manifests when they become eligible for reclaim.  This patch implements
more intuitive and usable behaviour, where we gradually mount more
reclaim pressure as cgroups further and further exceed their protection
thresholds.

This cliff edge behaviour happens because we only choose whether or not
to reclaim based on whether the memcg is within its protection limits
(see the use of mem_cgroup_protected in shrink_node), but we don't vary
our reclaim behaviour based on this information.  Imagine the following
timeline, with the numbers the lruvec size in this zone:

1. memory.low=1000000, memory.current=999999. 0 pages may be scanned.
2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned.
3. memory.low=1000000, memory.current=1000001. 1000001* pages may be
   scanned. (?!)

* Of course, we won't usually scan all available pages in the zone even
  without this patch because of scan control priority, over-reclaim
  protection, etc.  However, as shown by the tests at the end, these
  techniques don't sufficiently throttle such an extreme change in input,
  so cliff-like behaviour isn't really averted by their existence alone.

Here's an example of how this plays out in practice.  At Facebook, we are
trying to protect various workloads from "system" software, like
configuration management tools, metric collectors, etc (see this[0] case
study).  In order to find a suitable memory.low value, we start by
determining the expected memory range within which the workload will be
comfortable operating.  This isn't an exact science -- memory usage deemed
"comfortable" will vary over time due to user behaviour, differences in
composition of work, etc, etc.  As such we need to ballpark memory.low,
but doing this is currently problematic:

1. If we end up setting it too low for the workload, it won't have
   *any* effect (see discussion above).  The group will receive the full
   weight of reclaim and won't have any priority while competing with the
   less important system software, as if we had no memory.low configured
   at all.

2. Because of this behaviour, we end up erring on the side of setting
   it too high, such that the comfort range is reliably covered.  However,
   protected memory is completely unavailable to the rest of the system,
   so we might cause undue memory and IO pressure there when we *know* we
   have some elasticity in the workload.

3. Even if we get the value totally right, smack in the middle of the
   comfort zone, we get extreme jumps between no pressure and full
   pressure that cause unpredictable pressure spikes in the workload due
   to the current binary reclaim behaviour.

With this patch, we can set it to our ballpark estimation without too much
worry.  Any undesirable behaviour, such as too much or too little reclaim
pressure on the workload or system will be proportional to how far our
estimation is off.  This means we can set memory.low much more
conservatively and thus waste less resources *without* the risk of the
workload falling off a cliff if we overshoot.

As a more abstract technical description, this unintuitive behaviour
results in having to give high-priority workloads a large protection
buffer on top of their expected usage to function reliably, as otherwise
we have abrupt periods of dramatically increased memory pressure which
hamper performance.  Having to set these thresholds so high wastes
resources and generally works against the principle of work conservation.
In addition, having proportional memory reclaim behaviour has other
benefits.  Most notably, before this patch it's basically mandatory to set
memory.low to a higher than desirable value because otherwise as soon as
you exceed memory.low, all protection is lost, and all pages are eligible
to scan again.  By contrast, having a gradual ramp in reclaim pressure
means that you now still get some protection when thresholds are exceeded,
which means that one can now be more comfortable setting memory.low to
lower values without worrying that all protection will be lost.  This is
important because workingset size is really hard to know exactly,
especially with variable workloads, so at least getting *some* protection
if your workingset size grows larger than you expect increases user
confidence in setting memory.low without a huge buffer on top being
needed.

Thanks a lot to Johannes Weiner and Tejun Heo for their advice and
assistance in thinking about how to make this work better.

In testing these changes, I intended to verify that:

1. Changes in page scanning become gradual and proportional instead of
   binary.

   To test this, I experimented stepping further and further down
   memory.low protection on a workload that floats around 19G workingset
   when under memory.low protection, watching page scan rates for the
   workload cgroup:

   +------------+-----------------+--------------------+--------------+
   | memory.low | test (pgscan/s) | control (pgscan/s) | % of control |
   +------------+-----------------+--------------------+--------------+
   |        21G |               0 |                  0 | N/A          |
   |        17G |             867 |               3799 | 23%          |
   |        12G |            1203 |               3543 | 34%          |
   |         8G |            2534 |               3979 | 64%          |
   |         4G |            3980 |               4147 | 96%          |
   |          0 |            3799 |               3980 | 95%          |
   +------------+-----------------+--------------------+--------------+

   As you can see, the test kernel (with a kernel containing this
   patch) ramps up page scanning significantly more gradually than the
   control kernel (without this patch).

2. More gradual ramp up in reclaim aggression doesn't result in
   premature OOMs.

   To test this, I wrote a script that slowly increments the number of
   pages held by stress(1)'s --vm-keep mode until a production system
   entered severe overall memory contention.  This script runs in a highly
   protected slice taking up the majority of available system memory.
   Watching vmstat revealed that page scanning continued essentially
   nominally between test and control, without causing forward reclaim
   progress to become arrested.

[0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project

[akpm@linux-foundation.org: reflow block comments to fit in 80 cols]
[chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection]
  Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name
Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9783aa99

memcg: only record foreign writebacks with dirty pages when memcg is not disabled · 08d1d0e6

由 Baoquan He 提交于 10月 06, 2019

In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory'
for saving memory.  Now kdump kernel will always panic when dump vmcore
to local disk:

  BUG: kernel NULL pointer dereference, address: 0000000000000ab8
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26
  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
  RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140
  Call Trace:
   __set_page_dirty+0x52/0xc0
   iomap_set_page_dirty+0x50/0x90
   iomap_write_end+0x6e/0x270
   iomap_write_actor+0xce/0x170
   iomap_apply+0xba/0x11e
   iomap_file_buffered_write+0x62/0x90
   xfs_file_buffered_aio_write+0xca/0x320 [xfs]
   new_sync_write+0x12d/0x1d0
   vfs_write+0xa5/0x1a0
   ksys_write+0x59/0xd0
   do_syscall_64+0x59/0x1e0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

And this will corrupt the 1st kernel too with 'cgroup_disable=memory'.

Via the trace and with debugging, it is pointing to commit 97b27821
("writeback, memcg: Implement foreign dirty flushing") which introduced
this regression.  Disabling memcg causes the null pointer dereference at
uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath().

Fix it by returning directly if memcg is disabled, but not trying to
record the foreign writebacks with dirty pages.

Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv
Fixes: 97b27821 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: NBaoquan He <bhe@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08d1d0e6

uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to it · c512c691

由 Linus Torvalds 提交于 10月 07, 2019

In commit 9f79b78e ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I made filldir() use unsafe_put_user(), which
improves code generation on x86 enormously.

But because we didn't have a "unsafe_copy_to_user()", the dirent name
copy was also done by hand with unsafe_put_user() in a loop, and it
turns out that a lot of other architectures didn't like that, because
unlike x86, they have various alignment issues.

Most non-x86 architectures trap and fix it up, and some (like xtensa)
will just fail unaligned put_user() accesses unconditionally. Which
makes that "copy using put_user() in a loop" not work for them at all.

I could make that code do explicit alignment etc, but the architectures
that don't like unaligned accesses also don't really use the fancy
"user_access_begin/end()" model, so they might just use the regular old
__copy_to_user() interface.

So this commit takes that looping implementation, turns it into the x86
version of "unsafe_copy_to_user()", and makes other architectures
implement the unsafe copy version as __copy_to_user() (the same way they
do for the other unsafe_xyz() accessor functions).

Note that it only does this for the copying _to_ user space, and we
still don't have a unsafe version of copy_from_user().

That's partly because we have no current users of it, but also partly
because the copy_from_user() case is slightly different and cannot
efficiently be implemented in terms of a unsafe_get_user() loop (because
gcc can't do asm goto with outputs).

It would be trivial to do this using "rep movsb", which would work
really nicely on newer x86 cores, but really badly on some older ones.

Al Viro is looking at cleaning up all our user copy routines to make
this all a non-issue, but for now we have this simple-but-stupid version
for x86 that works fine for the dirent name copy case because those
names are short strings and we simply don't need anything fancier.

Fixes: 9f79b78e ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: NTony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c512c691

module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflict · fa6643cd

由 Masahiro Yamada 提交于 10月 03, 2019

The module namespace produces __strtab_ns_<sym> symbols to store
namespace strings, but it does not guarantee the name uniqueness.
This is a potential problem because we have exported symbols starting
with "ns_".

For example, kernel/capability.c exports the following symbols:

  EXPORT_SYMBOL(ns_capable);
  EXPORT_SYMBOL(capable);

Assume a situation where those are converted as follows:

  EXPORT_SYMBOL_NS(ns_capable, some_namespace);
  EXPORT_SYMBOL_NS(capable, some_namespace);

The former expands to "__kstrtab_ns_capable" and "__kstrtab_ns_ns_capable",
and the latter to "__kstrtab_capable" and "__kstrtab_ns_capable".
Then, we have the duplicated "__kstrtab_ns_capable".

To ensure the uniqueness, rename "__kstrtab_ns_*" to "__kstrtabns_*".
Reviewed-by: NMatthias Maennich <maennich@google.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NJessica Yu <jeyu@kernel.org>

fa6643cd

module: swap the order of symbol.namespace · bf70b050

由 Masahiro Yamada 提交于 10月 03, 2019

Currently, EXPORT_SYMBOL_NS(_GPL) constructs the kernel symbol as
follows:

  __ksymtab_SYMBOL.NAMESPACE

The sym_extract_namespace() in modpost allocates memory for the part
SYMBOL.NAMESPACE when '.' is contained. One problem is that the pointer
returned by strdup() is lost because the symbol name will be copied to
malloc'ed memory by alloc_symbol(). No one will keep track of the
pointer of strdup'ed memory.

sym->namespace still points to the NAMESPACE part. So, you can free it
with complicated code like this:

   free(sym->namespace - strlen(sym->name) - 1);

It complicates memory free.

To fix it elegantly, I swapped the order of the symbol and the
namespace as follows:

  __ksymtab_NAMESPACE.SYMBOL

then, simplified sym_extract_namespace() so that it allocates memory
only for the NAMESPACE part.

I prefer this order because it is intuitive and also matches to major
languages. For example, NAMESPACE::NAME in C++, MODULE.NAME in Python.
Reviewed-by: NMatthias Maennich <maennich@google.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NJessica Yu <jeyu@kernel.org>

bf70b050

07 10月, 2019 5 次提交

net: sockmap: use bitmap for copy info · 163ab96b

由 Jakub Kicinski 提交于 10月 06, 2019

Don't use bool array in struct sk_msg_sg, save 12 bytes.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

163ab96b

net: core: change return type of pskb_may_pull to bool · b9df4fd7

由 Heiner Kallweit 提交于 10月 06, 2019

This function de-facto returns a bool, so let's change the return type
accordingly.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9df4fd7

efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing · e658c82b

由 Jerry Snitselaar 提交于 10月 02, 2019

If __calc_tpm2_event_size() fails to parse an event it will return 0,
resulting tpm2_calc_event_log_size() returning -1. Currently there is
no check of this return value, and 'efi_tpm_final_log_size' can end up
being set to this negative value resulting in a crash like this one:

  BUG: unable to handle page fault for address: ffffbc8fc00866ad
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page

  RIP: 0010:memcpy_erms+0x6/0x10
  Call Trace:
   tpm_read_log_efi()
   tpm_bios_log_setup()
   tpm_chip_register()
   tpm_tis_core_init.cold.9+0x28c/0x466
   tpm_tis_plat_probe()
   platform_drv_probe()
   ...

Also __calc_tpm2_event_size() returns a size of 0 when it fails
to parse an event, so update function documentation to reflect this.

The root cause of the issue that caused the failure of event parsing
in this case is resolved by Peter Jone's patchset dealing with large
event logs where crossing over a page boundary causes the page with
the event count to be unmapped.
Signed-off-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: c46f3405 ("tpm: Reserve the TPM final events table")
Link: https://lkml.kernel.org/r/20191002165904.8819-6-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e658c82b

efi/tpm: Don't access event->count when it isn't mapped · 047d50ae

由 Peter Jones 提交于 10月 02, 2019

Some machines generate a lot of event log entries.  When we're
iterating over them, the code removes the old mapping and adds a
new one, so once we cross the page boundary we're unmapping the page
with the count on it.  Hilarity ensues.

This patch keeps the info from the header in local variables so we don't
need to access that page again or keep track of if it's mapped.
Tested-by: NLyude Paul <lyude@redhat.com>
Signed-off-by: NPeter Jones <pjones@redhat.com>
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Acked-by: NMatthew Garrett <mjg59@google.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 44038bc5 ("tpm: Abstract crypto agile event size calculations")
Link: https://lkml.kernel.org/r/20191002165904.8819-4-ard.biesheuvel@linaro.org
[ Minor edits. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

047d50ae

driver core: platform: Add platform_get_irq_byname_optional() · f1da567f

由 Hans de Goede 提交于 10月 05, 2019

Some drivers (e.g dwc3) first try to get an IRQ byname and then fall
back to the one at index 0. In this case we do not want the error(s)
printed by platform_get_irq_byname(). This commit adds a new
platform_get_irq_byname_optional(), which does not print errors, for this.

While at it also improve the kdoc text for platform_get_irq_byname() a bit.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205037Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20191005210449.3926-2-hdegoede@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f1da567f

05 10月, 2019 5 次提交

net: phy: extract pause mode · 2d880b87

由 Russell King 提交于 10月 04, 2019

Extract the update of phylib's software pause mode state from
genphy_read_status(), so that we can re-use this functionality with
PHYs that have alternative ways to read the negotiation results.
Tested-by: Ntinywrkb <tinywrkb@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d880b87

net: phy: extract link partner advertisement reading · 8d3dc3ac

由 Russell King 提交于 10月 04, 2019

Move reading the link partner advertisement out of genphy_read_status()
into its own separate function.  This will allow re-use of this code by
PHY drivers that are able to read the resolved status from the PHY.
Tested-by: Ntinywrkb <tinywrkb@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d3dc3ac

net: phy: fix write to mii-ctrl1000 register · 4cf6c57e

由 Russell King 提交于 10月 04, 2019

When userspace writes to the MII_ADVERTISE register, we update phylib's
advertising mask and trigger a renegotiation. However, writing to the
MII_CTRL1000 register, which contains the gigabit advertisement, does
neither. This can lead to phylib's copy of the advertisement becoming
de-synced with the values in the PHY register set, which can result in
incorrect negotiation resolution.

Fixes: 5502b218 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cf6c57e

net: fib_notifier: propagate extack down to the notifier block callback · b7a59557

由 Jiri Pirko 提交于 10月 03, 2019

Since errors are propagated all the way up to the caller, propagate
possible extack of the caller all the way down to the notifier block
callback.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7a59557

net: fib_notifier: make FIB notifier per-netns · 7c550daf

由 Jiri Pirko 提交于 10月 03, 2019

Currently all users of FIB notifier only cares about events in init_net.
Later in this patchset, users get interested in other namespaces too.
However, for every registered block user is interested only about one
namespace. Make the FIB notifier registration per-netns and avoid
unnecessary calls of notifier block for other namespaces.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c550daf

04 10月, 2019 1 次提交

net: stmmac: Only enable enhanced addressing mode when needed · 968a2978

由 Thierry Reding 提交于 10月 02, 2019

Enhanced addressing mode is only required when more than 32 bits need to
be addressed. Add a DMA configuration parameter to enable this mode only
when needed.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

968a2978

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功