提交 · e6727f39004bd95725342b3b343a14c7d59df07f · openanolis / cloud-kernel

10 1月, 2017 24 次提交

smc: send data (through RDMA) · e6727f39

由 Ursula Braun 提交于 1月 09, 2017

copy data to kernel send buffer, and trigger RDMA write
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6727f39

smc: connection data control (CDC) · 5f08318f

由 Ursula Braun 提交于 1月 09, 2017

send and receive CDC messages (via IB message send and CQE)
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f08318f

smc: link layer control (LLC) · 9bf9abea

由 Ursula Braun 提交于 1月 09, 2017

send and receive LLC messages CONFIRM_LINK (via IB message send and CQE)
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf9abea

smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR · bd4ad577

由 Ursula Braun 提交于 1月 09, 2017

Prepare the link for RDMA transport:
Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd4ad577

smc: work request (WR) base for use by LLC and CDC · f38ba179

由 Ursula Braun 提交于 1月 09, 2017

The base containers for RDMA transport are work requests and completion
queue entries processed through Infiniband verbs:
* allocate and initialize these areas
* map these areas to DMA
* implement the basic communication consisting of work request posting
  and receival of completion queue events
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38ba179

smc: remote memory buffers (RMBs) · cd6851f3

由 Ursula Braun 提交于 1月 09, 2017

* allocate data RMB memory for sending and receiving
* size depends on the maximum socket send and receive buffers
* allocated RMBs are kept during life time of the owning link group
* map the allocated RMBs to DMA
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd6851f3

smc: connection and link group creation · 0cfdd8f9

由 Ursula Braun 提交于 1月 09, 2017

* create smc_connection for SMC-sockets
* determine suitable link group for a connection
* create a new link group if necessary
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cfdd8f9

smc: CLC handshake (incl. preparation steps) · a046d57d

由 Ursula Braun 提交于 1月 09, 2017

* CLC (Connection Layer Control) handshake
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a046d57d

smc: establish pnet table management · 6812baab

由 Thomas Richter 提交于 1月 09, 2017

Connection creation with SMC-R starts through an internal
TCP-connection. The Ethernet interface for this TCP-connection is not
restricted to the Ethernet interface of a RoCE device. Any existing
Ethernet interface belonging to the same physical net can be used, as
long as there is a defined relation between the Ethernet interface and
some RoCE devices. This relation is defined with the help of an
identification string called "Physical Net ID" or short "pnet ID".
Information about defined pnet IDs and their related Ethernet
interfaces and RoCE devices is stored in the SMC-R pnet table.

A pnet table entry consists of the identifying pnet ID and the
associated network and IB device.
This patch adds pnet table configuration support using the
generic netlink message interface referring to network and IB device
by their names. Commands exist to add, delete, and display pnet table
entries, and to flush or display the entire pnet table.

There are cross-checks to verify whether the ethernet interfaces
or infiniband devices really exist in the system. If either device
is not available, the pnet ID entry is not created.
Loss of network devices and IB devices is also monitored;
a pnet ID entry is removed when an associated network or
IB device is removed.
Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6812baab

smc: introduce SMC as an IB-client · a4cf0443

由 Ursula Braun 提交于 1月 09, 2017

* create a list of SMC IB-devices
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4cf0443

smc: establish new socket family · ac713874

由 Ursula Braun 提交于 1月 09, 2017

* enable smc module loading and unloading
 * register new socket family
 * basic smc socket creation and deletion
 * use backing TCP socket to run CLC (Connection Layer Control)
   handshake of SMC protocol
 * Setup for infiniband traffic is implemented in follow-on patches.
   For now fallback to TCP socket is always used.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NUtz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac713874

net: introduce keepalive function in struct proto · 4b9d07a4

由 Ursula Braun 提交于 1月 09, 2017

Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NUtz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b9d07a4

net: for rate-limited ICMP replies save one atomic operation · 7ba91ecb

由 Jesper Dangaard Brouer 提交于 1月 09, 2017

It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.

The icmp_global_allow() call states it must be called with BH
disabled.  This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock.  Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ba91ecb

net: reduce cycles spend on ICMP replies that gets rate limited · c0303efe

由 Jesper Dangaard Brouer 提交于 1月 09, 2017

This patch split the global and per (inet)peer ICMP-reply limiter
code, and moves the global limit check to earlier in the packet
processing path.  Thus, avoid spending cycles on ICMP replies that
gets limited/suppressed anyhow.

The global ICMP rate limiter icmp_global_allow() is a good solution,
it just happens too late in the process.  The kernel goes through the
full route lookup (return path) for the ICMP message, before taking
the rate limit decision of not sending the ICMP reply.

Details: The kernels global rate limiter for ICMP messages got added
in commit 4cdf507d ("icmp: add a global rate limitation").  It is
a token bucket limiter with a global lock.  It brilliantly avoids
locking congestion by only updating when 20ms (HZ/50) were elapsed. It
can then avoids taking lock when credit is exhausted (when under
pressure) and time constraint for refill is not yet meet.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0303efe

Revert "icmp: avoid allocating large struct on stack" · 8d9ba388

由 Jesper Dangaard Brouer 提交于 1月 09, 2017

This reverts commit 9a99d4a5 ("icmp: avoid allocating large struct
on stack"), because struct icmp_bxm no really a large struct, and
allocating and free of this small 112 bytes hurts performance.

Fixes: 9a99d4a5 ("icmp: avoid allocating large struct on stack")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d9ba388

net: dsa: Make dsa_switch_ops const · a82f67af

由 Florian Fainelli 提交于 1月 08, 2017

Now that we have properly encapsulated and made drivers utilize exported
functions, we can switch dsa_switch_ops to be a annotated with const.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a82f67af

net: dsa: Encapsulate legacy switch drivers into dsa_switch_driver · ab3d408d

由 Florian Fainelli 提交于 1月 08, 2017

In preparation for making struct dsa_switch_ops const, encapsulate it
within a dsa_switch_driver which has a list pointer and a pointer to
dsa_switch_ops. This allows us to take the list_head pointer out of
dsa_switch_ops, which is written to by {un,}register_switch_driver.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab3d408d

net/sched: act_csum: compute crc32c on SCTP packets · c008b33f

由 Davide Caratti 提交于 1月 09, 2017

modify act_csum to compute crc32c on IPv4/IPv6 packets having SCTP in
their payload, and extend UAPI definitions accordingly.
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c008b33f

net/sched: Kconfig: select LIBCRC32C if NET_ACT_CSUM is selected · ab9d226e

由 Davide Caratti 提交于 1月 09, 2017

LIBCRC32C is needed to compute crc32c on SCTP packets.
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab9d226e

cls_u32: don't bother explicitly initializing ->divisor to zero · 58fa118f

由 Alexandru Moise 提交于 1月 08, 2017

This struct member is already initialized to zero upon root_ht's
allocation via kzalloc().
Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58fa118f

syncookies: use SipHash in place of SHA1 · fe62d05b

由 Jason A. Donenfeld 提交于 1月 08, 2017

SHA1 is slower and less secure than SipHash, and so replacing syncookie
generation with SipHash makes natural sense. Some BSDs have been doing
this for several years in fact.

The speedup should be similar -- and even more impressive -- to the
speedup from the sequence number fix in this series.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe62d05b

secure_seq: use SipHash in place of MD5 · 7cd23e53

由 Jason A. Donenfeld 提交于 1月 08, 2017

This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, for IPv6, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code. For IPv4, we pass the values directly to the
short input convenience functions.

64-bit x86_64:
[    1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
[    1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
[    1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
[    1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526

32-bit x86:
[    1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
[    1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
[    1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
[    1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cd23e53

net: ipv4: remove disable of bottom half in inet_rtm_getroute · eafea739

由 David Ahern 提交于 1月 07, 2017

Nothing about the route lookup requires bottom half to be disabled.
Remove the local_bh_disable ... local_bh_enable around ip_route_input.
This appears to be a vestige of days gone by as it has been there
since the beginning of git time.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eafea739

net: change init_inodecache() return void · 1e911632

由 yuan linyu 提交于 1月 07, 2017

sock_init() call it but not check it's return value,
so change it to void return and add an internal BUG_ON() check.
Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e911632

09 1月, 2017 9 次提交

rxrpc: Allow listen(sock, 0) to be used to disable listening · 210f0353

由 David Howells 提交于 1月 05, 2017

Allow listen() with a backlog of 0 to be used to disable listening on an
AF_RXRPC socket.  This also releases any preallocation, thereby making it
easier for a kernel service to account for all allocated call structures
when shutting down the service.

The socket cannot thereafter have listening reenabled, but must rather be
closed and reopened.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

210f0353

net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905

由 Willem de Bruijn 提交于 1月 07, 2017

The tc_from field fulfills two roles. It encodes whether a packet was
redirected by an act_mirred device and, if so, whether act_mirred was
called on ingress or egress. Split it into separate fields.

The information is needed by the special IFB loop, where packets are
taken out of the normal path by act_mirred, forwarded to IFB, then
reinjected at their original location (ingress or egress) by IFB.

The IFB device cannot use skb->tc_at_ingress, because that may have
been overwritten as the packet travels from act_mirred to ifb_xmit,
when it passes through tc_classify on the IFB egress path. Cache this
value in skb->tc_from_ingress.

That field is valid only if a packet arriving at ifb_xmit came from
act_mirred. Other packets can be crafted to reach ifb_xmit. These
must be dropped. Set tc_redirected on redirection and drop all packets
that do not have this bit set.

Both fields are set only on cloned skbs in tc actions, so original
packet sources do not have to clear the bit when reusing packets
(notably, pktgen and octeon).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc31c905

net-tc: convert tc_at to tc_at_ingress · 8dc07fdb

由 Willem de Bruijn 提交于 1月 07, 2017

Field tc_at is used only within tc actions to distinguish ingress from
egress processing. A single bit is sufficient for this purpose.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dc07fdb

net-tc: convert tc_verd to integer bitfields · a5135bcf

由 Willem de Bruijn 提交于 1月 07, 2017

Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5135bcf

net-tc: extract skip classify bit from tc_verd · e7246e12

由 Willem de Bruijn 提交于 1月 07, 2017

Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7246e12

net-tc: make MAX_RECLASSIFY_LOOP local · d6264071

由 Willem de Bruijn 提交于 1月 07, 2017

This field is no longer kept in tc_verd. Remove it from the global
definition of that struct.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6264071

net: make ndo_get_stats64 a void function · bc1f4470

由 stephen hemminger 提交于 1月 06, 2017

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc1f4470

net: ipv4: Remove flow arg from ip_mkroute_input · dc33da59

由 David Ahern 提交于 1月 06, 2017

fl4 arg is not used; remove it.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc33da59

net: ipmr: Remove nowait arg to ipmr_get_route · 9f09eaea

由 David Ahern 提交于 1月 06, 2017

ipmr_get_route has 1 caller and the nowait arg is 0. Remove the arg and
simplify ipmr_get_route accordingly.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f09eaea

08 1月, 2017 1 次提交

net: dsa: move HWMON support to its own file · 111427f6

由 Vivien Didelot 提交于 1月 06, 2017

Isolate the HWMON support in DSA in its own file. Currently only the
legacy DSA code is concerned.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

111427f6

07 1月, 2017 6 次提交

netlabel: add CALIPSO to the list of built-in protocols · bcd5e1a4

由 Paul Moore 提交于 1月 06, 2017

When we added CALIPSO support in Linux v4.8 we forgot to add it to the
list of supported protocols with display at boot.
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcd5e1a4

l2tp: rework socket comparison in __l2tp_ip*_bind_lookup() · c5fdae04

由 Guillaume Nault 提交于 1月 06, 2017

Split conditions, so that each test becomes clearer.

Also, for l2tp_ip, check if "laddr" is 0. This prevents a socket from
binding to the unspecified address when other sockets are already bound
using the same device (if any), connection ID and namespace.

Same thing for l2tp_ip6: add ipv6_addr_any(laddr) and
ipv6_addr_any(raddr) tests to ensure that an IPv6 unspecified address
passed as parameter is properly treated a wildcard.
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5fdae04

l2tp: remove useless NULL check in __l2tp_ip*_bind_lookup() · 986f7cbc

由 Guillaume Nault 提交于 1月 06, 2017

If "l2tp" was NULL, that'd mean "sk" is NULL too. This can't happen
since "sk" is returned by sk_for_each_bound().
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

986f7cbc

l2tp: make __l2tp_ip*_bind_lookup() parameters 'const' · bb39b0bd

由 Guillaume Nault 提交于 1月 06, 2017

Add const qualifier wherever possible for __l2tp_ip_bind_lookup() and
__l2tp_ip6_bind_lookup().
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb39b0bd

l2tp: remove redundant addr_len check in l2tp_ip_bind() · 8cf2f704

由 Guillaume Nault 提交于 1月 06, 2017

addr_len's value has already been verified at this point.
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cf2f704

RDS: validate the requested traces user input against max supported · 780e9829

由 santosh.shilimkar@oracle.com 提交于 1月 06, 2017

Larger than supported value can lead to array read/write overflow.
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

780e9829

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功