提交 · 4bcb877d257c87298aedead1ffeaba0d5df1991d · openanolis / cloud-kernel

06 11月, 2014 2 次提交

udp: Offload outer UDP tunnel csum if available · 4bcb877d

由 Tom Herbert 提交于 11月 04, 2014

In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bcb877d

net: Move fou_build_header into fou.c and refactor · 63487bab

由 Tom Herbert 提交于 11月 04, 2014

Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63487bab

05 11月, 2014 15 次提交

udp: remove blank line between set and test · 6cf1093e

由 Fabian Frederick 提交于 11月 04, 2014

Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cf1093e

ipv6: trivial, add bracket for the if block · 869ba988

由 Florent Fourcot 提交于 11月 04, 2014

The "else" block is on several lines and use bracket.
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

869ba988

esp4: remove assignment in if condition · 05006e8c

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05006e8c

net: allow setting ecn via routing table · f7b3bec6

由 Florian Westphal 提交于 11月 03, 2014

This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7b3bec6

syncookies: split cookie_check_timestamp() into two functions · f1673381

由 Florian Westphal 提交于 11月 03, 2014

The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.

We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.

While this is the natural place for decoding options such as ECN which
commit 172d69e6 ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.

Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.

This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.

This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.

Joint work with Daniel Borkmann.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1673381

syncookies: avoid magic values and document which-bit-is-what-option · 274e2da0

由 Florian Westphal 提交于 11月 03, 2014

Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.

Joint work with Daniel Borkmann.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

274e2da0

igmp: remove camel case definitions · 436f7c20

由 Fabian Frederick 提交于 11月 04, 2014

use standard uppercase for definitions
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

436f7c20

udp: remove else after return · c18450a5

由 Fabian Frederick 提交于 11月 04, 2014

else is unnecessary after return 0 in __udp4_lib_rcv()
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c18450a5

inet: frags: remove inline on static in c file · aa1f731e

由 Fabian Frederick 提交于 11月 04, 2014

remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: N"David S. Miller" <davem@davemloft.net>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa1f731e

ipv4: remove 0/NULL assignment on static · 0d3979b9

由 Fabian Frederick 提交于 11月 04, 2014

static values are automatically initialized to 0
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d3979b9

F
ipv4: use seq_puts instead of seq_printf where possible · c9f503b0
由 Fabian Frederick 提交于 11月 04, 2014
```
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c9f503b0

tcp: spelling s/plugable/pluggable · b92022f3

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b92022f3

cipso: remove NULL assignment on static · 988b1343

由 Fabian Frederick 提交于 11月 04, 2014

Also add blank line after structure declarations
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

988b1343

ipv4: include linux/bug.h instead of asm/bug.h · 4c787b16

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c787b16

cipso: kerneldoc warning fix · 4973404f

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4973404f

04 11月, 2014 2 次提交

net: add rbnode to struct sk_buff · 56b17425

由 Eric Dumazet 提交于 11月 03, 2014

Yaogong replaces TCP out of order receive queue by an RB tree.

As netem already does a private skb->{next/prev/tstamp} union
with a 'struct rb_node', lets do this in a cleaner way.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56b17425

net: less interrupt masking in NAPI · d75b1ade

由 Eric Dumazet 提交于 11月 02, 2014

net_rx_action() can mask irqs a single time to transfert sd->poll_list
into a private list, for a very short duration.

Then, napi_complete() can avoid masking irqs again,
and net_rx_action() only needs to mask irq again in slow path.

This patch removes 2 couples of irq mask/unmask per typical NAPI run,
more if multiple napi were triggered.

Note this also allows to give control back to caller (do_softirq())
more often, so that other softirq handlers can be called a bit earlier,
or ksoftirqd can be wakeup earlier under pressure.

This was developed while testing an alternative to RX interrupt
mitigation to reduce latencies while keeping or improving GRO
aggregation on fast NIC.

Idea is to test napi->gro_list at the end of a napi->poll() and
reschedule one NAPI poll, but after servicing a full round of
softirqs (timers, TX, rcu, ...). This will be allowed only if softirq
is currently serviced by idle task or ksoftirqd, and resched not needed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d75b1ade

01 11月, 2014 3 次提交

net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 · e0fb6fb6

由 Guenter Roeck 提交于 10月 30, 2014

If a driver supports reading EEPROM but no EEPROM is installed in the system,
the driver's get_eeprom_len function returns 0. ethtool will subsequently
try to read that zero-length EEPROM anyway. If the driver does not support
EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
does support EEPROM access but no EEPROM is installed, the operation will
return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0fb6fb6

mpls: Allow mpls_gso to be built as module · de05c400

由 Pravin B Shelar 提交于 10月 30, 2014

Kconfig already allows mpls to be built as module. Following patch
fixes Makefile to do same.

CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de05c400

mpls: Fix mpls_gso handler. · f7065f4b

由 Pravin B Shelar 提交于 10月 30, 2014

mpls gso handler needs to pull skb after segmenting skb.

CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7065f4b

31 10月, 2014 18 次提交

netfilter: nft_reject_bridge: restrict reject to prerouting and input · 127917c2

由 Pablo Neira Ayuso 提交于 10月 27, 2014

Restrict the reject expression to the prerouting and input bridge
hooks. If we allow this to be used from forward or any other later
bridge hook, if the frame is flooded to several ports, we'll end up
sending several reject packets, one per cloned packet.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

127917c2

netfilter: nft_reject_bridge: don't use IP stack to reject traffic · 523b929d

由 Pablo Neira Ayuso 提交于 10月 25, 2014

If the packet is received via the bridge stack, this cannot reject
packets from the IP stack.

This adds functions to build the reject packet and send it from the
bridge stack. Comments and assumptions on this patch:

1) Validate the IPv4 and IPv6 headers before further processing,
   given that the packet comes from the bridge stack, we cannot assume
   they are clean. Truncated packets are dropped, we follow similar
   approach in the existing iptables match/target extensions that need
   to inspect layer 4 headers that is not available. This also includes
   packets that are directed to multicast and broadcast ethernet
   addresses.

2) br_deliver() is exported to inject the reject packet via
   bridge localout -> postrouting. So the approach is similar to what
   we already do in the iptables reject target. The reject packet is
   sent to the bridge port from which we have received the original
   packet.

3) The reject packet is forged based on the original packet. The TTL
   is set based on sysctl_ip_default_ttl for IPv4 and per-net
   ipv6.devconf_all hoplimit for IPv6.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

523b929d

netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions · 8bfcdf66

由 Pablo Neira Ayuso 提交于 10月 26, 2014

That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8bfcdf66

netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions · 052b9498

由 Pablo Neira Ayuso 提交于 10月 25, 2014

That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

052b9498

netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing · 4d87716c

由 Pablo Neira Ayuso 提交于 10月 25, 2014

Fixes: 36d2af59 ("netfilter: nf_tables: allow to filter from prerouting and postrouting")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4d87716c

drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets · 5188cd44

由 Ben Hutchings 提交于 10月 30, 2014

UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5188cd44

net: skb_fclone_busy() needs to detect orphaned skb · 39bb5e62

由 Eric Dumazet 提交于 10月 30, 2014

Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()

Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 1f3279ae ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39bb5e62

tcp: Correction to RFC number in comment · cd214535

由 Sowmini Varadhan 提交于 10月 30, 2014

Challenge ACK is described in RFC 5961, fix typo.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd214535

gre: Use inner mac length when computing tunnel length · 14051f04

由 Tom Herbert 提交于 10月 30, 2014

Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14051f04

sctp: replace seq_printf with seq_puts · afb6befc

由 Michele Baldessari 提交于 10月 30, 2014

Fixes checkpatch warning:
"WARNING: Prefer seq_puts to seq_printf"
Signed-off-by: NMichele Baldessari <michele@acksyn.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afb6befc

sctp: add transport state in /proc/net/sctp/remaddr · 891310d5

由 Michele Baldessari 提交于 10月 30, 2014

It is often quite helpful to be able to know the state of a transport
outside of the application itself (for troubleshooting purposes or for
monitoring purposes). Add it under /proc/net/sctp/remaddr.
Signed-off-by: NMichele Baldessari <michele@acksyn.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

891310d5

ipv4: Do not cache routing failures due to disabled forwarding. · fa19c2b0

由 Nicolas Cavallari 提交于 10月 30, 2014

If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not.  Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them.  The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src
Signed-off-by: NNicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa19c2b0

tipc: spelling errors · b2ad5e5f

由 stephen hemminger 提交于 10月 29, 2014

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2ad5e5f

syncookies: only increment SYNCOOKIESFAILED on validation error · 646697b9

由 Florian Westphal 提交于 10月 30, 2014

Only count packets that failed cookie-authentication.
We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

646697b9

ipv4: minor spelling fixes · f4e715c3

由 stephen hemminger 提交于 10月 29, 2014

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4e715c3

ip6_tunnel: allow to change mode for the ip6tnl0 · acf722f7

由 Alexey Andriyanov 提交于 10月 29, 2014

The fallback device is in ipv6 mode by default.
The mode can not be changed in runtime, so there
is no way to decapsulate ip4in6 packets coming from
various sources without creating the specific tunnel
ifaces for each peer.

This allows to update the fallback tunnel device, but only
the mode could be changed. Usual command should work for the
fallback device: `ip -6 tun change ip6tnl0 mode any`

The fallback device can not be hidden from the packet receiver
as a regular tunnel, but there is no need for synchronization
as long as we do single assignment.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NAlexey Andriyanov <alan@al-an.info>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acf722f7

ipv6: remove assignment in if condition · 43728fa5

由 Fabian Frederick 提交于 10月 29, 2014

Do assignment before if condition and test !skb like in rawv6_recvmsg()
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43728fa5

ipv6: remove inline on static in c file · fc08c258

由 Fabian Frederick 提交于 10月 29, 2014

remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: N"David S. Miller" <davem@davemloft.net>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc08c258

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功