提交 · 00c569b567c7f1f0da6162868fd02a9f29411805 · openeuler / Kernel

25 12月, 2018 1 次提交

net: dccp: fix kernel crash on module load · c92c81df

由 Peter Oskolkov 提交于 12月 24, 2018

Patch eedbbb0d "net: dccp: initialize (addr,port) ..."
added calling to inet_hashinfo2_init() from dccp_init().

However, inet_hashinfo2_init() is marked as __init(), and
thus the kernel panics when dccp is loaded as module. Removing
__init() tag from inet_hashinfo2_init() is not feasible because
it calls into __init functions in mm.

This patch adds inet_hashinfo2_init_mod() function that can
be called after the init phase is done; changes dccp_init() to
call the new function; un-marks inet_hashinfo2_init() as
exported.

Fixes: eedbbb0d ("net: dccp: initialize (addr,port) ...")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92c81df

23 12月, 2018 4 次提交

crypto: skcipher - remove remnants of internal IV generators · c79b411e

由 Eric Biggers 提交于 12月 16, 2018

Remove dead code related to internal IV generators, which are no longer
used since they've been replaced with the "seqiv" and "echainiv"
templates.  The removed code includes:

- The "givcipher" (GIVCIPHER) algorithm type.  No algorithms are
  registered with this type anymore, so it's unneeded.

- The "const char *geniv" member of aead_alg, ablkcipher_alg, and
  blkcipher_alg.  A few algorithms still set this, but it isn't used
  anymore except to show via /proc/crypto and CRYPTO_MSG_GETALG.
  Just hardcode "<default>" or "<none>" in those cases.

- The 'skcipher_givcrypt_request' structure, which is never used.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c79b411e

crypto: api - document missing stats member · bfad6cb3

由 Corentin Labbe 提交于 12月 13, 2018

This patchs adds missing member of stats documentation.
Signed-off-by: NCorentin Labbe <clabbe@baylibre.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

bfad6cb3

crypto: user - remove unused dump functions · 0c99c2a0

由 Corentin Labbe 提交于 12月 13, 2018

This patch removes unused dump functions for crypto_user_stats.
There are remains of the copy/paste of crypto_user_base to
crypto_user_stat and I forgot to remove them.
Signed-off-by: NCorentin Labbe <clabbe@baylibre.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0c99c2a0

dma-mapping: fix flags in dma_alloc_wc · 0cd60eb1

由 Christoph Hellwig 提交于 12月 22, 2018

We really need the writecombine flag in dma_alloc_wc, fix a stupid
oversight.

Fixes: 7ed1d91a ("dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0cd60eb1

22 12月, 2018 1 次提交

net: drop the unused helper skb_ext_get() · d312d0a6

由 Paolo Abeni 提交于 12月 21, 2018

Such helper is currently unused, and skb extension users are
better off using skb_ext_add()/skb_ext_del(). So let's drop
it.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d312d0a6

21 12月, 2018 13 次提交

Revert "compiler-gcc: disable -ftracer for __noclone functions" · 2bcbd406

由 Sean Christopherson 提交于 12月 20, 2018

The -ftracer optimization was disabled in __noclone as a workaround to
GCC duplicating a blob of inline assembly that happened to define a
global variable.  It has been pointed out that no amount of workarounds
can guarantee the compiler won't duplicate inline assembly[1], and that
disabling the -ftracer optimization has several unintended and nasty
side effects[2][3].

Now that the offending KVM code which required the workaround has
been properly fixed and no longer uses __noclone, remove the -ftracer
optimization tweak from __noclone.

[1] https://lore.kernel.org/lkml/ri6y38lo23g.fsf@suse.cz/T/#u
[2] https://lore.kernel.org/lkml/20181218140105.ajuiglkpvstt3qxs@treble/T/#u
[3] https://patchwork.kernel.org/patch/8707981/#21817015

This reverts commit 95272c29.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bcbd406

kvm: Change offset in kvm_write_guest_offset_cached to unsigned · 7a86dab8

由 Jim Mattson 提交于 12月 14, 2018

Since the offset is added directly to the hva from the
gfn_to_hva_cache, a negative offset could result in an out of bounds
write. The existing BUG_ON only checks for addresses beyond the end of
the gfn_to_hva_cache, not for addresses before the start of the
gfn_to_hva_cache.

Note that all current call sites have non-negative offsets.

Fixes: 4ec6e863 ("kvm: Introduce kvm_write_guest_offset_cached()")
Reported-by: NCfir Cohen <cfir@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NCfir Cohen <cfir@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

7a86dab8

net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQE · 5e0d2eef

由 Tariq Toukan 提交于 11月 21, 2018

Add support for the HW feature of multi-packet WQE in XDP
xmit flow.

The conventional TX descriptor (WQE, Work Queue Element) serves
a single packet. Our HW has support for multi-packet WQE (MPWQE)
in which a single descriptor serves multiple TX packets.

This reduces both the PCI overhead and the CPU cycles wasted on
writing them.

In this patch we add support for the HW feature, which is supported
starting from ConnectX-5.

Performance:
Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_TX:
We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
milestone.
* Single-port HCA:
	Before:   70 Mpps
	After:   100 Mpps (+42.8%)

* Dual-port HCA:
	Before: 51.7 Mpps
	After:  57.3 Mpps (+10.8%)

* In both cases we tested traffic on one port and for now On Dual-port HCAs
  we see only small gain, we are working to overcome this bottleneck, but
  for the moment only with experimental firmware on dual port HCAs we can
  reach the wanted numbers as seen on Single-port HCAs.

XDP_REDIRECT:
Redirect from (A) ConnectX-5 to (B) ConnectX-5.
Due to a setup limitation, (A) and (B) are on different NUMA nodes,
so absolute performance numbers are not optimal.
Note:
  Below is the transmit rate of (B), not the redirect rate of (A)
  which is in some cases higher.

* (B) is single-port:
	Before:   77 Mpps
	After:    90 Mpps (+16.8%)

* (B) is dual-port:
	Before:  61 Mpps
	After:   72 Mpps (+18%)
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5e0d2eef

vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver · 7f928917

由 Alexey Kardashevskiy 提交于 12月 20, 2018

POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
pluggable PCIe devices but still have PCIe links which are used
for config space and MMIO. In addition to that the GPUs have 6 NVLinks
which are connected to other GPUs and the POWER9 CPU. POWER9 chips
have a special unit on a die called an NPU which is an NVLink2 host bus
adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each.
These systems also support ATS (address translation services) which is
a part of the NVLink2 protocol. Such GPUs also share on-board RAM
(16GB or 32GB) to the system via the same NVLink2 so a CPU has
cache-coherent access to a GPU RAM.

This exports GPU RAM to the userspace as a new VFIO device region. This
preregisters the new memory as device memory as it might be used for DMA.
This inserts pfns from the fault handler as the GPU memory is not onlined
until the vendor driver is loaded and trained the NVLinks so doing this
earlier causes low level errors which we fence in the firmware so
it does not hurt the host system but still better be avoided; for the same
reason this does not map GPU RAM into the host kernel (usual thing for
emulated access otherwise).

This exports an ATSD (Address Translation Shootdown) register of NPU which
allows TLB invalidations inside GPU for an operating system. The register
conveniently occupies a single 64k page. It is also presented to
the userspace as a new VFIO device region. One NPU has 8 ATSD registers,
each of them can be used for TLB invalidation in a GPU linked to this NPU.
This allocates one ATSD register per an NVLink bridge allowing passing
up to 6 registers. Due to the host firmware bug (just recently fixed),
only 1 ATSD register per NPU was actually advertised to the host system
so this passes that alone register via the first NVLink bridge device in
the group which is still enough as QEMU collects them all back and
presents to the guest via vPHB to mimic the emulated NPU PHB on the host.

In order to provide the userspace with the information about GPU-to-NVLink
connections, this exports an additional capability called "tgt"
(which is an abbreviated host system bus address). The "tgt" property
tells the GPU its own system address and allows the guest driver to
conglomerate the routing information so each GPU knows how to get directly
to the other GPUs.

For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
know LPID (a logical partition ID or a KVM guest hardware ID in other
words) and PID (a memory context ID of a userspace process, not to be
confused with a linux pid). This assigns a GPU to LPID in the NPU and
this is why this adds a listener for KVM on an IOMMU group. A PID comes
via NVLink from a GPU and NPU uses a PID wildcard to pass it through.

This requires coherent memory and ATSD to be available on the host as
the GPU vendor only supports configurations with both features enabled
and other configurations are known not to work. Because of this and
because of the ways the features are advertised to the host system
(which is a device tree with very platform specific properties),
this requires enabled POWERNV platform.

The V100 GPUs do not advertise any of these capabilities via the config
space and there are more than just one device ID so this relies on
the platform to tell whether these GPUs have special abilities such as
NVLinks.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7f928917

net: seg6.h: remove an unused #include · a6ae520d

由 Peter Oskolkov 提交于 12月 20, 2018

A minor code cleanup.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6ae520d

linux/netlink.h: drop unnecessary extern prefix · aa9d6e0f

由 Stephen Hemminger 提交于 12月 20, 2018

Don't need extern prefix before function prototypes.
Checkpatch has complained about this for a couple of years.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa9d6e0f

netfilter: netns: shrink netns_ct struct · 8527f9df

由 Florian Westphal 提交于 12月 18, 2018

remove the obsolete sysctl anchors and move auto_assign_helper_warned
to avoid/cover a hole.  Reduces size by 40 bytes on 64 bit.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8527f9df

netfilter: conntrack: remove empty pernet fini stubs · fc3893fd

由 Florian Westphal 提交于 12月 18, 2018

after moving sysctl handling into single place, the init functions
can't fail anymore and some of the fini functions are empty.

Remove them and change return type to void.
This also simplifies error unwinding in conntrack module init path.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

fc3893fd

netfilter: conntrack: un-export seq_print_acct · 4b216e21

由 Florian Westphal 提交于 12月 18, 2018

Only one caller, just place it where its needed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4b216e21

netfilter: conntrack: udp: only extend timeout to stream mode after 2s · d535c8a6

由 Florian Westphal 提交于 12月 06, 2018

Currently DNS resolvers that send both A and AAAA queries from same source port
can trigger stream mode prematurely, which results in non-early-evictable conntrack entry
for three minutes, even though DNS requests are done in a few milliseconds.

Add a two second grace period where we continue to use the ordinary
30-second default timeout. Its enough for DNS request/response traffic,
even if two request/reply packets are involved.

ASSURED is still set, else conntrack (and thus a possible
NAT mapping ...) gets zapped too in case conntrack table runs full.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d535c8a6

bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c

由 John Fastabend 提交于 12月 20, 2018

A sockmap program that redirects through a kTLS ULP enabled socket
will not work correctly because the ULP layer is skipped. This
fixes the behavior to call through the ULP layer on redirect to
ensure any operations required on the data stream at the ULP layer
continue to be applied.

To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
calling the BPF layer on a redirected message. This is
required to avoid calling the BPF layer multiple times (possibly
recursively) which is not the current/expected behavior without
ULPs. In the future we may add a redirect flag if users _do_
want the policy applied again but this would need to work for both
ULP and non-ULP sockets and be opt-in to avoid breaking existing
programs.

Also to avoid polluting the flag space with an internal flag we
reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
to verify is user space API is masked correctly to ensure the flag
can not be set by user. (Note this needs to be true regardless
because we have internal flags already in-use that user space
should not be able to set). But for completeness we have two UAPI
paths into sendpage, sendfile and splice.

In the sendfile case the function do_sendfile() zero's flags,

./fs/read_write.c:
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
		   	    size_t count, loff_t max)
 {
   ...
   fl = 0;
#if 0
   /*
    * We need to debate whether we can enable this or not. The
    * man page documents EAGAIN return for the output at least,
    * and the application is arguably buggy if it doesn't expect
    * EAGAIN on a non-blocking file descriptor.
    */
    if (in.file->f_flags & O_NONBLOCK)
	fl = SPLICE_F_NONBLOCK;
#endif
    file_start_write(out.file);
    retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
 }

In the splice case the pipe_to_sendpage "actor" is used which
masks flags with SPLICE_F_MORE.

./fs/splice.c:
 static int pipe_to_sendpage(struct pipe_inode_info *pipe,
			    struct pipe_buffer *buf, struct splice_desc *sd)
 {
   ...
   more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
   ...
 }

Confirming what we expect that internal flags  are in fact internal
to socket side.

Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

0608c69c

bpf: sk_msg, fix socket data_ready events · 552de910

由 John Fastabend 提交于 12月 20, 2018

When a skb verdict program is in-use and either another BPF program
redirects to that socket or the new SK_PASS support is used the
data_ready callback does not wake up application. Instead because
the stream parser/verdict is using the sk data_ready callback we wake
up the stream parser/verdict block.

Fix this by adding a helper to check if the stream parser block is
enabled on the sk and if so call the saved pointer which is the
upper layers wake up function.

This fixes application stalls observed when an application is waiting
for data in a blocking read().

Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

552de910

bpf: skmsg, replace comments with BUILD bug · 7a69c0f2

由 John Fastabend 提交于 12月 20, 2018

Enforce comment on structure layout dependency with a BUILD_BUG_ON
to ensure the condition is maintained.
Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

7a69c0f2

20 12月, 2018 20 次提交

powerpc: use mm zones more sensibly · 25078dc1

由 Christoph Hellwig 提交于 12月 16, 2018

Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.

Move to a scheme closer to what other architectures use (and I dare to
say the intent of the system):

 - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
 - ZONE_NORMAL: everything addressable by the kernel
 - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels

Also provide information on how ZONE_DMA is used by defining
ARCH_ZONE_DMA_BITS.

Contains various fixes from Benjamin Herrenschmidt.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

25078dc1

PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set · 5d32a665

由 Sinan Kaya 提交于 12月 19, 2018

We are compiling PCI code today for systems with ACPI and no PCI
device present. Remove the useless code and reduce the tight
dependency.
Signed-off-by: NSinan Kaya <okaya@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5d32a665

ACPICA: Remove PCI bits from ACPICA when CONFIG_PCI is unset · bd23fac3

由 Sinan Kaya 提交于 12月 19, 2018

Allow ACPI to be built without PCI support in place.
Signed-off-by: NSinan Kaya <okaya@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bd23fac3

iptunnel: make TUNNEL_FLAGS available in uapi · 1875a9ab

由 wenxu 提交于 12月 19, 2018

ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap Key example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

In the lwtunnel situation, some TUNNEL_FLAGS should can be set by
userspace
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1875a9ab

neighbour: register rtnl doit handler · 82cbb5c6

由 Roopa Prabhu 提交于 12月 19, 2018

this patch registers neigh doit handler. The doit handler
returns a neigh entry given dst and dev. This is similar
to route and fdb doit (get) handlers. Also moves nda_policy
declaration from rtnetlink.c to neighbour.c
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82cbb5c6

net: switch secpath to use skb extension infrastructure · 4165079b

由 Florian Westphal 提交于 12月 18, 2018

Remove skb->sp and allocate secpath storage via extension
infrastructure.  This also reduces sk_buff by 8 bytes on x86_64.

Total size of allyesconfig kernel is reduced slightly, as there is
less inlined code (one conditional atomic op instead of two on
skb_clone).

No differences in throughput in following ipsec performance tests:
- transport mode with aes on 10GB link
- tunnel mode between two network namespaces with aes and null cipher
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4165079b

xfrm: use secpath_exist where applicable · 26912e37

由 Florian Westphal 提交于 12月 18, 2018

Will reduce noise when skb->sp is removed later in this series.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26912e37

net: use skb_sec_path helper in more places · 2294be0f

由 Florian Westphal 提交于 12月 18, 2018

skb_sec_path gains 'const' qualifier to avoid
xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type

same reasoning as previous conversions: Won't need to touch these
spots anymore when skb->sp is removed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2294be0f

net: move secpath_exist helper to sk_buff.h · 7af8f4ca

由 Florian Westphal 提交于 12月 18, 2018

Future patch will remove skb->sp pointer.
To reduce noise in those patches, move existing helper to
sk_buff and use it in more places to ease skb->sp replacement later.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7af8f4ca

xfrm: change secpath_set to return secpath struct, not error value · 0ca64da1

由 Florian Westphal 提交于 12月 18, 2018

It can only return 0 (success) or -ENOMEM.
Change return value to a pointer to secpath struct.

This avoids direct access to skb->sp:

err = secpath_set(skb);
if (!err) ..
skb->sp-> ...

Becomes:
sp = secpath_set(skb)
if (!sp) ..
sp-> ..

This reduces noise in followup patch which is going to remove skb->sp.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ca64da1

net: convert bridge_nf to use skb extension infrastructure · de8bda1d

由 Florian Westphal 提交于 12月 18, 2018

This converts the bridge netfilter (calling iptables hooks from bridge)
facility to use the extension infrastructure.

The bridge_nf specific hooks in skb clone and free paths are removed, they
have been replaced by the skb_ext hooks that do the same as the bridge nf
allocations hooks did.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de8bda1d

sk_buff: add skb extension infrastructure · df5042f4

由 Florian Westphal 提交于 12月 18, 2018

This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.
objdiff shows no changes if kernel is built without xfrm and br_netfilter
support.

The third (planned future) user is Multipath TCP which is still
out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.

This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.

Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.

mptcp has same requirements as secpath/bridge netfilter:

1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.

The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
mapping for tx and rx processing.

Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
   are available for this skb.
   This has two purposes.
   a) avoids the need to initialize the pointer.
   b) allows to "delete" an extension by clearing its bit
   value in ->active_extensions.

   While it would be possible to store the active_extensions byte
   in the extension struct instead of sk_buff, there is one problem
   with this:
    When an extension has to be disabled, we can always clear the
    bit in skb->active_extensions.  But in case it would be stored in the
    extension buffer itself, we might have to COW it first, if
    we are dealing with a cloned skb.  On kmalloc failure we would
    be unable to turn an extension off.

2. extension pointer, located at the end of the sk_buff.
   If the active_extensions byte is 0, the pointer is undefined,
   it is not initialized on skb allocation.

This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.

It is possible to add support for extensions that are not preseved on
clones/copies.

To do this, it would be needed to define a bitmask of all extensions that
need copy/cow semantics, and change __skb_ext_copy() to check
->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
->active_extensions to 0 on the new clone.

This isn't done here because all extensions that get added here
need the copy/cow semantics.

v2:
Allocate entire extension space using kmem_cache.
Upside is that this allows better tracking of used memory,
downside is that we will allocate more space than strictly needed in
most cases (its unlikely that all extensions are active/needed at same
time for same skb).
The allocated memory (except the small extension header) is not cleared,
so no additonal overhead aside from memory usage.

Avoid atomic_dec_and_test operation on skb_ext_put()
by using similar trick as kfree_skbmem() does with fclone_ref:
If recount is 1, there is no concurrent user and we can free right away.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df5042f4

netfilter: avoid using skb->nf_bridge directly · c4b0e771

由 Florian Westphal 提交于 12月 18, 2018

This pointer is going to be removed soon, so use the existing helpers in
more places to avoid noise when the removal happens.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b0e771

ext4: force inode writes when nfsd calls commit_metadata() · fde87268

由 Theodore Ts'o 提交于 12月 19, 2018

Some time back, nfsd switched from calling vfs_fsync() to using a new
commit_metadata() hook in export_operations().  If the file system did
not provide a commit_metadata() hook, it fell back to using
sync_inode_metadata().  Unfortunately doesn't work on all file
systems.  In particular, it doesn't work on ext4 due to how the inode
gets journalled --- the VFS writeback code will not always call
ext4_write_inode().

So we need to provide our own ext4_nfs_commit_metdata() method which
calls ext4_write_inode() directly.

Google-Bug-Id: 121195940
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

fde87268

regmap: irq: add an option to clear status registers on unmask · c82ea33e

由 Bartosz Golaszewski 提交于 12月 19, 2018

Some interrupt controllers whose interrupts are acked on read will set
the status bits for masked interrupts without changing the state of
the IRQ line.

Some chips have an additional "feature" where if those set bits are
not cleared before unmasking their respective interrupts, the IRQ
line will change the state and we'll interpret this as an interrupt
although it actually fired when it was masked.

Add a new field to the irq chip struct that tells the regmap irq chip
code to always clear the status registers before actually changing the
irq mask values.
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

c82ea33e

soc: fsl: dpio: Add BP and FQ query APIs · e80081c3

由 Roy Pledge 提交于 12月 18, 2018

Add FQ (Frame Queue) and BP (Buffer Pool) query APIs that
users of QBMan can invoke to see the status of the queues
and pools that they are using.
Signed-off-by: NRoy Pledge <roy.pledge@nxp.com>
Signed-off-by: NIoana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e80081c3

regmap: regmap-irq/gpio-max77620: add level-irq support · 1c2928e3

由 Matti Vaittinen 提交于 12月 18, 2018

Add level active IRQ support to regmap-irq irqchip. Change breaks
existing regmap-irq type setting. Convert the existing drivers which
use regmap-irq with trigger type setting (gpio-max77620) to work
with this new approach. So we do not magically support level-active
IRQs on gpio-max77620 - but add support to the regmap-irq for chips
which support them =)

We do not support distinguishing situation where HW supports rising
and falling edge detection but not both. Separating this would require
inventing yet another flags for IRQ types.
Signed-off-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Signed-off-by: NMark Brown <broonie@kernel.org>

1c2928e3

KVM: arm/arm64: Remove arch timer workqueue · 8a411b06

由 Christoffer Dall 提交于 11月 27, 2018

The use of a work queue in the hrtimer expire function for the bg_timer
is a leftover from the time when we would inject interrupts when the
bg_timer expired.

Since we are no longer doing that, we can instead call
kvm_vcpu_wake_up() directly from the hrtimer function and remove all
workqueue functionality from the arch timer code.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

8a411b06

ALSA: HD-Audio: SKL+: force HDaudio legacy or SKL+ driver selection · d82b51c8

由 Pierre-Louis Bossart 提交于 12月 15, 2018

For HDaudio and Skylake drivers, add module parameter "pci_binding"

When pci_binding == 0 (AUTO), the PCI class/subclass info is used to
select drivers based on the presence of the DSP.

pci_binding == 1 (LEGACY) forces the use of the HDAudio legacy driver,
even if the DSP is present.

pci_binding == 2 (ASOC) forces the use of the ASOC driver. The
information on the DSP presence is bypassed.

The value for the module parameter needs to be identical for both
drivers. This parameter is intended as a back-up solution if the
automatic detection fails or when the DSP usage fails. Such cases
should be reported on the alsa-devel mailing list for analysis.
Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

d82b51c8

ALSA: HDA: export process_unsol_events() · 18d43c9b

由 Keyon Jie 提交于 12月 11, 2018

The SOF implementation does not rely on the hdac_bus library, however
for HDMI and HDaudio codec support it does need to deal with
unsolicited events. Instead of re-inventing the wheel, export this
symbol to reuse this part of the library directly.
Signed-off-by: NKeyon Jie <yang.jie@linux.intel.com>
Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

18d43c9b

19 12月, 2018 1 次提交

Revert "x86/objtool: Use asm macros to work around GCC inlining bugs" · 96af6cd0

由 Ingo Molnar 提交于 12月 19, 2018

This reverts commit c06c4d80.

See this commit for details about the revert:

  e769742d ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"")
Reported-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: NBorislav Petkov <bp@alien8.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

96af6cd0

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功