提交 · 8fb472c09b9df478a062eacc7841448e40fc3c17 · openeuler / raspberrypi-kernel

13 1月, 2017 2 次提交

ipmr: improve hash scalability · 8fb472c0

由 Nikolay Aleksandrov 提交于 1月 12, 2017

Recently we started using ipmr with thousands of entries and easily hit
soft lockups on smaller devices. The reason is that the hash function
uses the high order bits from the src and dst, but those don't change in
many common cases, also the hash table  is only 64 elements so with
thousands it doesn't scale at all.
This patch migrates the hash table to rhashtable, and in particular the
rhl interface which allows for duplicate elements to be chained because
of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple
duplicate entries to be added with different interfaces (IMO wrong, but
it's been in for a long time).

And here are some results from tests I've run in a VM:
 mr_table size (default, allocated for all namespaces):
  Before                    After
   49304 bytes               2400 bytes

 Add 65000 routes (the diff is much larger on smaller devices):
  Before                    After
   1m42s                     58s

 Forwarding 256 byte packets with 65000 routes (test done in a VM):
  Before                    After
   3 Mbps / ~1465 pps        122 Mbps / ~59000 pps

As a bonus we no longer see the soft lockups on smaller devices which
showed up even with 2000 entries before.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fb472c0

secure_seq: fix sparse errors · c1ce1560

由 Eric Dumazet 提交于 1月 11, 2017

Fixes following warnings :

net/core/secure_seq.c:125:28: warning: incorrect type in argument 1
(different base types)
net/core/secure_seq.c:125:28:    expected unsigned int const [unsigned]
[usertype] a
net/core/secure_seq.c:125:28:    got restricted __be32 [usertype] saddr
net/core/secure_seq.c:125:35: warning: incorrect type in argument 2
(different base types)
net/core/secure_seq.c:125:35:    expected unsigned int const [unsigned]
[usertype] b
net/core/secure_seq.c:125:35:    got restricted __be32 [usertype] daddr
net/core/secure_seq.c:125:43: warning: cast from restricted __be16
net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to
integer

Fixes: 7cd23e53 ("secure_seq: use SipHash in place of MD5")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1ce1560

12 1月, 2017 8 次提交

lwt_bpf: bpf_lwt_prog_cmp() can be static · 79471b10

由 Wei Yongjun 提交于 1月 12, 2017

Fixes the following sparse warning:

net/core/lwt_bpf.c:355:5: warning:
 symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static?
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79471b10

bpf: allow b/h/w/dw access for bpf's cb in ctx · 62c7989b

由 Daniel Borkmann 提交于 1月 12, 2017

When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62c7989b

bpf: pass original insn directly to convert_ctx_access · 6b8cc1d1

由 Daniel Borkmann 提交于 1月 12, 2017

Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b8cc1d1

smc: ETH_ALEN as memcpy length for mac addresses · 143c0171

由 Ursula Braun 提交于 1月 12, 2017

When creating an SMC connection, there is a CLC (connection layer control)
handshake to prepare for RDMA traffic. The corresponding code is part of
commit 0cfdd8f9 ("smc: connection and link group creation").
Mac addresses to be exchanged in the handshake are copied with a wrong
length of 12 instead of 6 bytes. Following code overwrites the wrongly
copied code, but nevertheless the correct length should already be used for
the preceding mac address copying. Use ETH_ALEN for the memcpy length with
mac addresses.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Fixes: 0cfdd8f9 ("smc: connection and link group creation")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

143c0171

net: fix AF_SMC related typo · 526735dd

由 Ursula Braun 提交于 1月 12, 2017

When introducing the new socket family AF_SMC in
commit ac713874 ("smc: establish new socket family"),
a typo in af_family_clock_key_strings has slipped in.
This patch repairs it.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Fixes: ac713874 ("smc: establish new socket family")
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

526735dd

net: core: Make netif_wake_subqueue a wrapper · 738b35cc

由 Florian Fainelli 提交于 1月 11, 2017

netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue()
does, so make it call it directly after looking up the queue from the index.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

738b35cc

net/sched: cls_flower: Support matching on ARP · 99d31326

由 Simon Horman 提交于 1月 11, 2017

Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \
	arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \
	arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99d31326

flow disector: ARP support · 55733350

由 Simon Horman 提交于 1月 11, 2017

Allow dissection of (R)ARP operation hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

There are currently no users of FLOW_DISSECTOR_KEY_ARP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the
flower classifier.
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55733350

11 1月, 2017 17 次提交

sctp: Fix spelling mistake: "Atempt" -> "Attempt" · eb004603

由 Colin Ian King 提交于 1月 10, 2017

Trivial fix to spelling mistake in WARN_ONCE message
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb004603

net: ipv4: Fix multipath selection with vrf · 7a18c5b9

由 David Ahern 提交于 1月 10, 2017

fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a18c5b9

Revert "net: dsa: Implement ndo_get_phys_port_id" · 592050b2

由 Florian Fainelli 提交于 1月 10, 2017

This reverts commit 3a543ef4 ("net: dsa:
Implement ndo_get_phys_port_id") since it misuses the purpose of
ndo_get_phys_port_id(). We have ndo_get_phys_port_name() to do the
correct thing for us now.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

592050b2

net: dsa: Implement ndo_get_phys_port_name() · 44bb765c

由 Florian Fainelli 提交于 1月 10, 2017

Return the physical port number of a DSA created network device using
ndo_get_phys_port_name().
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44bb765c

cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig · 73b35147

由 Arnd Bergmann 提交于 1月 10, 2017

We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:

warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)

I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.

Fixes: 483c4933 ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73b35147

net: dsa: make "label" property optional for dsa2 · 9f91484f

由 Vivien Didelot 提交于 1月 09, 2017

In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
phandles are respectively mandatory and exclusive to CPU port and DSA
link device tree nodes.

Simplify dsa2.c a bit by checking the presence of such phandle instead
of checking the redundant "label" property.

Then the Linux philosophy for Ethernet switch ports is to expose them to
userspace as standard NICs by default. Thus use the standard enumerated
"eth%d" device name if no "label" property is provided for a user port.
This allows to save DTS files from subjective net device names.

If one wants to rename an interface, udev rules can be used as usual.

Of course the current behavior is unchanged, and the optional "label"
property for user ports has precedence over the enumerated name.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: NUwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f91484f

gro: use min_t() in skb_gro_reset_offset() · 7cfd5fd5

由 Eric Dumazet 提交于 1月 10, 2017

On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87 ("gro: Enter slow-path if there is no tailroom")
Reported-by: Nkernel test robot <fengguang.wu@intel.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cfd5fd5

mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free · 8c2dd3e4

由 Alexander Duyck 提交于 1月 10, 2017

Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent. First
we move the page_frag_ portion of the name to the front of the functions
names. Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this. I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs. Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix. This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c2dd3e4

gro: Disable frag0 optimization on IPv6 ext headers · 57ea52a8

由 Herbert Xu 提交于 1月 10, 2017

The GRO fast path caches the frag0 address.  This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.

This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.

This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.

Fixes: 78a478d0 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: NSlava Shwartsman <slavash@mellanox.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57ea52a8

gro: Enter slow-path if there is no tailroom · 1272ce87

由 Herbert Xu 提交于 1月 10, 2017

The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978c ("gro: Open-code final pskb_may_pull")
Reported-by: NSlava Shwartsman <slavash@mellanox.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1272ce87

net/af_iucv: don't use paged skbs for TX on HiperSockets · dc5367bc

由 Julian Wiedmann 提交于 1月 10, 2017

With commit e5374399
("af_iucv: use paged SKBs for big outbound messages"),
we transmit paged skbs for both of AF_IUCV's transport modes
(IUCV or HiperSockets).
The qeth driver for Layer 3 HiperSockets currently doesn't
support NETIF_F_SG, so these skbs would just be linearized again
by the stack.
Avoid that overhead by using paged skbs only for IUCV transport.

cc stable, since this also circumvents a significant skb leak when
sending large messages (where the skb then needs to be linearized).
Signed-off-by: NJulian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: e5374399 ("af_iucv: use paged SKBs for big outbound messages")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc5367bc

packet: pdiag_put_ring() should return TX_RING info for TPACKET_V3 · a505e582

由 Sowmini Varadhan 提交于 1月 10, 2017

Commit 7f953ab2 ("af_packet: TX_RING support for TPACKET_V3")
now makes it possible to use TX_RING with TPACKET_V3, so make the
the relevant information available via 'ss -e -a --packet'
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a505e582

net: add the AF_QIPCRTR entries to family name tables · 5d722b30

由 Anna, Suman 提交于 1月 09, 2017

Commit bdabad3e ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.

Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NSuman Anna <s-anna@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d722b30

net: qrtr: Mark 'buf' as little endian · 3512a1ad

由 Stephen Boyd 提交于 1月 09, 2017

Failure to mark this pointer as __le32 causes checkers like
sparse to complain:

net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident>

Silence it.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3512a1ad

net: dsa: Ensure validity of dst->ds[0] · faf3a932

由 Florian Fainelli 提交于 1月 09, 2017

It is perfectly possible to have non zero indexed switches being present
in a DSA switch tree, in such a case, we will be deferencing a NULL
pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
and ensure that dst->ds[0] is valid before doing anything with it.

Fixes: 0c73c523 ("net: dsa: Initialize CPU port ethtool ops per tree")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faf3a932

net: skb_flow_get_be16() can be static · d9584d8c

由 Eric Dumazet 提交于 1月 09, 2017

Removes following sparse complain :

net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?

Fixes: 972d3876 ("flow dissector: ICMP support")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9584d8c

net: socket: Make unnecessarily global sockfs_setattr() static · dc647ec8

由 Tobias Klauser 提交于 1月 10, 2017

Make sockfs_setattr() static as it is not used outside of net/socket.c

This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

Fixes: 86741ec2 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Acked-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc647ec8

10 1月, 2017 13 次提交

net: dsa: select NET_SWITCHDEV · 3a89eaa6

由 Vivien Didelot 提交于 1月 09, 2017

The support for DSA Ethernet switch chips depends on TCP/IP networking,
thus explicit that HAVE_NET_DSA depends on INET.

DSA uses SWITCHDEV, thus select it instead of depending on it.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a89eaa6

tcp: make TCP_INFO more consistent · b369e7fd

由 Eric Dumazet 提交于 1月 09, 2017

tcp_get_info() has to lock the socket, so lets lock it
for an extended critical section, so that various fields
have consistent values.

This solves an annoying issue that some applications
reported when multiple counters are updated during one
particular rx/rx event, and TCP_INFO was called from
another cpu.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b369e7fd

bpf: rename ARG_PTR_TO_STACK · 39f19ebb

由 Alexei Starovoitov 提交于 1月 09, 2017

since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39f19ebb

tcp: do not export tcp_peer_is_proven() · 6bb629db

由 Eric Dumazet 提交于 1月 09, 2017

After commit 1fb6f159 ("tcp: add tcp_conn_request"),
tcp_peer_is_proven() no longer needs to be exported.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bb629db

ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int · b007f090

由 Pavel Tikhomirov 提交于 1月 09, 2017

> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b007f090

ipv6: fix typos · 67c408cf

由 Alexander Alemayhu 提交于 1月 07, 2017

o s/approriate/appropriate
o s/discouvery/discovery
Signed-off-by: NAlexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67c408cf

smc: netlink interface for SMC sockets · f16a7dd5