提交 · 90c337da1524863838658078ec34241f45d8394d · openeuler / raspberrypi-kernel

07 6月, 2015 1 次提交

inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations · 90c337da

由 Eric Dumazet 提交于 6月 06, 2015

When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90c337da

05 6月, 2015 11 次提交

mpls: Add MPLS entropy label in flow_keys · b3baa0fb

由 Tom Herbert 提交于 6月 04, 2015

In flow dissector if an MPLS header contains an entropy label this is
saved in the new keyid field of flow_keys. The entropy label is
then represented in the flow hash function input.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3baa0fb

net: Add GRE keyid in flow_keys · 1fdd512c

由 Tom Herbert 提交于 6月 04, 2015

In flow dissector if a GRE header contains a keyid this is saved in the
new keyid field of flow_keys. The GRE keyid is then represented
in the flow hash function input.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fdd512c

net: Add IPv6 flow label to flow_keys · 87ee9e52

由 Tom Herbert 提交于 6月 04, 2015

In flow_dissector set the flow label in flow_keys for IPv6. This also
removes the shortcircuiting of flow dissection when a non-zero label
is present, the flow label can be considered to provide additional
entropy for a hash.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87ee9e52

net: Add VLAN ID to flow_keys · d34af823

由 Tom Herbert 提交于 6月 04, 2015

In flow_dissector set vlan_id in flow_keys when VLAN is found.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d34af823

net: Get rid of IPv6 hash addresses flow keys · 45b47fd0

由 Tom Herbert 提交于 6月 04, 2015

We don't need to return the IPv6 address hash as part of flow keys.
In general, using the IPv6 address hash is risky in a hash value
since the underlying use of xor provides no entropy. If someone
really needs the hash value they can get it from the full IPv6
addresses in flow keys (e.g. from flow_get_u32_src).
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45b47fd0

net: Add keys for TIPC address · 9f249089

由 Tom Herbert 提交于 6月 04, 2015

Add a new flow key for TIPC addresses.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f249089

net: Add full IPv6 addresses to flow_keys · c3f83241

由 Tom Herbert 提交于 6月 04, 2015

This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.

We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.

With this patch, Ethertype and IP protocol are now included in the
flow hash input.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3f83241

net: Get skb hash over flow_keys structure · 42aecaa9

由 Tom Herbert 提交于 6月 04, 2015

This patch changes flow hashing to use jhash2 over the flow_keys
structure instead just doing jhash_3words over src, dst, and ports.
This method will allow us take more input into the hashing function
so that we can include full IPv6 addresses, VLAN, flow labels etc.
without needing to resort to xor'ing which makes for a poor hash.
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42aecaa9

net: Remove superfluous setting of key_basic · c468efe2

由 Tom Herbert 提交于 6月 04, 2015

key_basic is set twice in __skb_flow_dissect which seems unnecessary.
Remove second one.
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c468efe2

net: Simplify GRE case in flow_dissector · ce3b5355

由 Tom Herbert 提交于 6月 04, 2015

Do break when we see routing flag or a non-zero version number in GRE
header.
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce3b5355

bpf: fix build due to missing tc_verd · 94db13fe

由 Alexei Starovoitov 提交于 6月 04, 2015

fix build error:
net/core/filter.c: In function 'bpf_clone_redirect':
net/core/filter.c:1429:18: error: 'struct sk_buff' has no member named 'tc_verd'
  if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)

Fixes: 3896d655 ("bpf: introduce bpf_clone_redirect() helper")
Reported-by: NOr Gerlitz <gerlitz.or@gmail.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94db13fe

04 6月, 2015 4 次提交

tcp: double default TSQ output bytes limit · c39c4c6a

由 Wei Liu 提交于 6月 03, 2015

Xen virtual network driver has higher latency than a physical NIC.
Having only 128K as limit for TSQ introduced 30% regression in guest
throughput.

This patch raises the limit to 256K. This reduces the regression to 8%.
This buys us more time to work out a proper solution in the long run.
Signed-off-by: NWei Liu <wei.liu2@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c39c4c6a

tcp: remove redundant checks · 12e25e10

由 Eric Dumazet 提交于 6月 03, 2015

tcp_v4_rcv() checks the following before calling tcp_v4_do_rcv():

if (th->doff < sizeof(struct tcphdr) / 4)
    goto bad_packet;
if (!pskb_may_pull(skb, th->doff * 4))
    goto discard_it;

So following check in tcp_v4_do_rcv() is redundant
and "goto csum_err;" is wrong anyway.

if (skb->len < tcp_hdrlen(skb) || ...)
	goto csum_err;

A second check can be removed after no_tcp_socket label for same reason.

Same tests can be removed in tcp_v6_do_rcv()

Note : short tcp frames are not properly accounted in tcpInErrs MIB,
because pskb_may_pull() failure simply drops incoming skb, we might
fix this in a separate patch.
Signed-off-by: NEric Dumazet  <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12e25e10

switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops · 7616dcbb

由 Scott Feldman 提交于 6月 03, 2015

Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.

Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7616dcbb

bpf: introduce bpf_clone_redirect() helper · 3896d655

由 Alexei Starovoitov 提交于 6月 02, 2015

Allow eBPF programs attached to classifier/actions to call
bpf_clone_redirect(skb, ifindex, flags) helper which will
mirror or redirect the packet by dynamic ifindex selection
from within the program to a target device either at ingress
or at egress. Can be used for various scenarios, for example,
to load balance skbs into veths, split parts of the traffic
to local taps, etc.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3896d655

03 6月, 2015 12 次提交

batman-adv: Remove unnecessary ret variable in algo_register · f372d090

由 Markus Pargmann 提交于 12月 26, 2014

Remove ret variable and all jumps.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

f372d090

batman-adv: Remove unnecessary ret variable · 9fb6c651

由 Markus Pargmann 提交于 12月 26, 2014

We can avoid this indirect return variable by directly returning the
error values.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

9fb6c651

batman-adv: main, batadv_compare_eth return bool · f2d5cf2a

由 Markus Pargmann 提交于 12月 26, 2014

Declare the returntype of batadv_compare_eth as bool.
The function called inside this helper function
(ether_addr_equal_unaligned) also uses bool as return value, so there is
no need to return int.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

f2d5cf2a

batman-adv: main, Convert is_my_mac() to bool · e8ad3b1a

由 Markus Pargmann 提交于 12月 26, 2014

It is much clearer to see a bool type as return value than 'int' for
functions that are supposed to return true or false.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

e8ad3b1a

batman-adv: Remove unnecessary check for orig_ifinfo not NULL · a0c77227

由 Sven Eckelmann 提交于 3月 01, 2015

orig_ifinfo is dereferenced multiple times in batadv_iv_ogm_update_seqnos
before the check for NULL is done. The function also exists at the
beginning when orig_ifinfo would have been NULL. This makes the check at
the end unnecessary and only confuses the reader/code analyzers.
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

a0c77227

batman-adv: types, Fix comment on bcast_own · 21102626

由 Markus Pargmann 提交于 12月 26, 2014

batadv_orig_bat_iv->bcast_own is actually not a bitfield, it is an
array. Adjust the comment accordingly.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NAntonio Quartulli <antonio@meshcoding.com>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

21102626

batman-adv: iv_ogm, fix comment function name · d491dbb6

由 Markus Pargmann 提交于 12月 26, 2014

This is a small copy paste fix for batadv_ing_buffer_avg.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

d491dbb6

batman-adv: iv_ogm, fix coding style · 6c4a1622

由 Markus Pargmann 提交于 12月 26, 2014

The kernel coding style says, that there should not be multiple
assignments in one row.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

6c4a1622

batman-adv: iv_ogm, Fix dup_status comment · 9f52ee19

由 Markus Pargmann 提交于 12月 26, 2014

Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

9f52ee19

batman-adv: iv_ogm_orig_update, style, add missing brackets · 23badd6d

由 Markus Pargmann 提交于 12月 26, 2014

CodingStyle describes that either none or both branches of a conditional
have to have brackets.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

23badd6d

batman-adv: iv_ogm_queue_add, Simplify expressions · 56489151

由 Markus Pargmann 提交于 12月 26, 2014

Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

56489151

batman-adv: iv_ogm_aggregate_new, simplify error handling · 940d156f

由 Markus Pargmann 提交于 12月 26, 2014

It is just a bit easier to put the error handling at one place and let
multiple error paths use the same calls.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>

940d156f

02 6月, 2015 6 次提交

vlan: Add GRO support for non hardware accelerated vlan · 66e5133f

由 Toshiaki Makita 提交于 6月 01, 2015

Currently packets with non-hardware-accelerated vlan cannot be handled
by GRO. This causes low performance for 802.1ad and stacked vlan, as their
vlan tags are currently not stripped by hardware.

This patch adds GRO support for non-hardware-accelerated vlan and
improves receive performance of them.

Test Environment:
 vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599)

Result:

- Before

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    5233.17

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.27     58.03      0.00     41.70      0.00

- After

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    60.00    7586.85

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.50     25.83      0.00     59.53     14.14

[ Register VLAN offloads with priority 10 -DaveM ]
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66e5133f

vti6: Add pmtu handling to vti6_xmit. · ccd740cb

由 Steffen Klassert 提交于 5月 29, 2015

We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccd740cb

openvswitch: include datapath actions with sampled-packet upcall to userspace · ccea7445

由 Neil McKee 提交于 5月 26, 2015

If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.

This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.

Adding path information enhances visibility into complex virtual
networks.
Signed-off-by: NNeil McKee <neil.mckee@inmon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccea7445

net: Add priority to packet_offload objects. · bdef7de4

由 David S. Miller 提交于 6月 01, 2015

When we scan a packet for GRO processing, we want to see the most
common packet types in the front of the offload_base list.

So add a priority field so we can handle this properly.

IPv4/IPv6 get the highest priority with the implicit zero priority
field.

Next comes ethernet with a priority of 10, and then we have the MPLS
types with a priority of 15.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Suggested-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdef7de4

Revert "net: core: 'ethtool' issue with querying phy settings" · 18ec898e

由 David S. Miller 提交于 6月 01, 2015

This reverts commit f96dee13.

It isn't right, ethtool is meant to manage one PHY instance
per netdevice at a time, and this is selected by the SET
command.  Therefore by definition the GET command must only
return the settings for the configured and selected PHY.
Reported-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18ec898e

Revert "netfilter: ensure number of counters is >0 in do_replace()" · d26e2c9f

由 Bernhard Thaler 提交于 5月 28, 2015

This partially reverts commit 1086bbe9 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.

Setting rules with ebtables does not work any more with 1086bbe9 place.

There is an error message and no rules set in the end.

e.g.

~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
   userspace tool doesn't by default support multiple ebtables programs
running

Reverting the ebtables part of 1086bbe9 makes this work again.
Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d26e2c9f

01 6月, 2015 6 次提交

net: dsa: Properly propagate errors from dsa_switch_setup_one · 24595346

由 Florian Fainelli 提交于 5月 29, 2015

While shuffling some code around, dsa_switch_setup_one() was introduced,
and it was modified to return either an error code using ERR_PTR() or a
NULL pointer when running out of memory or failing to setup a switch.

This is a problem for its caler: dsa_switch_setup() which uses IS_ERR()
and expects to find an error code, not a NULL pointer, so we still try
to proceed with dsa_switch_setup() and operate on invalid memory
addresses. This can be easily reproduced by having e.g: the bcm_sf2
driver built-in, but having no such switch, such that drv->setup will
fail.

Fix this by using PTR_ERR() consistently which is both more informative
and avoids for the caller to use IS_ERR_OR_NULL().

Fixes: df197195 ("net: dsa: split dsa_switch_setup into two functions")
Reported-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24595346

tcp: fix child sockets to use system default congestion control if not set · 9f950415

由 Neal Cardwell 提交于 5月 29, 2015

Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.

This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.

In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.

Fixes: 55d8694f ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f950415

net/rds Add getsockopt support for SO_RDS_TRANSPORT · 8ba38460

由 Sowmini Varadhan 提交于 5月 29, 2015

The currently attached transport for a PF_RDS socket may be obtained
from user space by invoking getsockopt(2) using the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer optval returned will be one
of the RDS_TRANS_* constants defined in linux/rds.h.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ba38460

net/rds: Add setsockopt support for SO_RDS_TRANSPORT · d97dac54

由 Sowmini Varadhan 提交于 5月 29, 2015

An application may deterministically attach the underlying transport for
a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer argument to setsockopt must be
one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option
must be specified before invoking bind(2) on the socket, and may only
be used once on the socket. An attempt to set the option on a bound
socket, or to invoke the option after a successful SO_RDS_TRANSPORT
attachment, will return EOPNOTSUPP.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d97dac54

net/rds: Declare SO_RDS_TRANSPORT and RDS_TRANS_* constants in uapi/linux/rds.h · a28c257c

由 Sowmini Varadhan 提交于 5月 29, 2015

User space applications that desire to explicitly select the
underlying transport for a PF_RDS socket may do so by using the
SO_RDS_TRANSPORT socket option at the SOL_RDS level before bind().
The integer argument provided to the socket option would be one
of the RDS_TRANS_* values, e.g., RDS_TRANS_TCP. This commit exports
the constant values need by such applications via <linux/rds.h>
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a28c257c

ebpf: allow bpf_ktime_get_ns_proto also for networking · 17ca8cbf

由 Daniel Borkmann 提交于 5月 29, 2015

As this is already exported from tracing side via commit d9847d31
("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might
as well want to move it to the core, so also networking users can make
use of it, e.g. to measure diffs for certain flows from ingress/egress.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17ca8cbf