提交 · f46ad73ac6965f71d1a2ba217c4eb5b644e62949 · openeuler / raspberrypi-kernel

07 11月, 2014 2 次提交

net: esp: Convert NETDEBUG to pr_info · 45083497

由 Joe Perches 提交于 11月 05, 2014

Commit 64ce2073 ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.

Commit a2a316fd ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.

Convert these NETDEBUG uses to normal pr_info calls.

This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.

These output lines are now like the other messages in the files.

Other miscellanea:

Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45083497

net; ipv[46] - Remove 2 unnecessary NETDEBUG OOM messages · cbffccc9

由 Joe Perches 提交于 11月 05, 2014

These messages aren't useful as there's a generic dump_stack()
on OOM.

Neaten the comment and if test above the OOM by separating the
assign in if into an allocation then if test.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbffccc9

06 11月, 2014 14 次提交

net: Remove MPLS GSO feature. · 59b93b41

由 Pravin B Shelar 提交于 11月 05, 2014

Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove NETIF_F_GSO_MPLS redundant flag.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

59b93b41

fou: Fix typo in returning flags in netlink · e1b2cb65

由 Tom Herbert 提交于 11月 05, 2014

When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1b2cb65

ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · 4c672e4b

由 Daniel Borkmann 提交于 11月 05, 2014

It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():

skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
 <IRQ>
 [<ffffffff80578226>] ? skb_put+0x3a/0x3b
 [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
 [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
 [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
 [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70

mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.

However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.

The IGMP case doesn't have this issue as commit 57e1ab6e
("igmp: refine skb allocations") stores the allocation size in
the cb[].

Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: NWei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c672e4b

net: Convert SEQ_START_TOKEN/seq_printf to seq_puts · 1744bea1

由 Joe Perches 提交于 11月 04, 2014

Using a single fixed string is smaller code size than using
a format and many string arguments.

Reduces overall code size a little.

$ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
   text	   data	    bss	    dec	    hex	filename
  34269	   7012	  14824	  56105	   db29	net/ipv4/igmp.o.new
  34315	   7012	  14824	  56151	   db57	net/ipv4/igmp.o.old
  30078	   7869	  13200	  51147	   c7cb	net/ipv6/mcast.o.new
  30105	   7869	  13200	  51174	   c7e6	net/ipv6/mcast.o.old
  11434	   3748	   8580	  23762	   5cd2	net/ipv6/ip6_flowlabel.o.new
  11491	   3748	   8580	  23819	   5d0b	net/ipv6/ip6_flowlabel.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1744bea1

tcp: zero retrans_stamp if all retrans were acked · 1f37bf87

由 Marcelo Leitner 提交于 11月 04, 2014

Ueki Kohei reported that when we are using NewReno with connections that
have a very low traffic, we may timeout the connection too early if a
second loss occurs after the first one was successfully acked but no
data was transfered later. Below is his description of it:

When SACK is disabled, and a socket suffers multiple separate TCP
retransmissions, that socket's ETIMEDOUT value is calculated from the
time of the *first* retransmission instead of the *latest*
retransmission.

This happens because the tcp_sock's retrans_stamp is set once then never
cleared.

Take the following connection:

                      Linux                    remote-machine
                        |                           |
         send#1---->(*1)|--------> data#1 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*2)|----> data#1(retrans) ---->|
                  | (*3)|<---------- ACK <----------|
                  |     |                           |
                  |     :                           :
                  |     :                           :
                  |     :                           :
                16 minutes (or more)                :
                  |     :                           :
                  |     :                           :
                  |     :                           :
                  |     |                           |
         send#2---->(*4)|--------> data#2 --------->|
                  |     |                           |
                 RTO    :                           :
                  |     |                           |
                 ---(*5)|----> data#2(retrans) ---->|
                  |     |                           |
                  |     |                           |
                RTO*2   :                           :
                  |     |                           |
                  |     |                           |
      ETIMEDOUT<----(*6)|                           |

(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.

At this point the first "retransmission event" has passed and been
recovered from. Any future retransmission is a completely new "event".

(*4) After 16 minutes (to correspond with retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and (*4).

The socket's timeout SHOULD be calculated from this point in time, but
instead it's calculated from the prior "event" 16 minutes ago.

(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT.

Therefore, now we clear retrans_stamp as soon as all data during the
loss window is fully acked.

Reported-by: Ueki Kohei
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f37bf87

net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b

由 David S. Miller 提交于 11月 05, 2014

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f3d02b

gue: Receive side of remote checksum offload · a8d31c12

由 Tom Herbert 提交于 11月 04, 2014

Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8d31c12

gue: TX support for using remote checksum offload option · b17f709a

由 Tom Herbert 提交于 11月 04, 2014

Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b17f709a

udp: Changes to udp_offload to support remote checksum offload · e585f236

由 Tom Herbert 提交于 11月 04, 2014

Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).

Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e585f236

gue: Add infrastructure for flags and options · 5024c33a

由 Tom Herbert 提交于 11月 04, 2014

Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5024c33a

udp: Offload outer UDP tunnel csum if available · 4bcb877d

由 Tom Herbert 提交于 11月 04, 2014

In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bcb877d

net: Move fou_build_header into fou.c and refactor · 63487bab

由 Tom Herbert 提交于 11月 04, 2014

Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63487bab

geneve: Unregister pernet subsys on module unload. · d3ca9eaf

由 Jesse Gross 提交于 11月 03, 2014

The pernet ops aren't ever unregistered, which causes a memory
leak and an OOPs if the module is ever reinserted.

Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3ca9eaf

geneve: Set GSO type on transmit. · 45cac46e

由 Jesse Gross 提交于 11月 03, 2014

Geneve does not currently set the inner protocol type when
transmitting packets. This causes GSO segmentation to fail on NICs
that do not support Geneve offloading.

CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45cac46e

05 11月, 2014 14 次提交

udp: remove blank line between set and test · 6cf1093e

由 Fabian Frederick 提交于 11月 04, 2014

Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cf1093e

esp4: remove assignment in if condition · 05006e8c

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05006e8c

net: allow setting ecn via routing table · f7b3bec6

由 Florian Westphal 提交于 11月 03, 2014

This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7b3bec6

syncookies: split cookie_check_timestamp() into two functions · f1673381

由 Florian Westphal 提交于 11月 03, 2014

The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.

We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.

While this is the natural place for decoding options such as ECN which
commit 172d69e6 ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.

Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.

This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.

This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.

Joint work with Daniel Borkmann.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1673381

syncookies: avoid magic values and document which-bit-is-what-option · 274e2da0

由 Florian Westphal 提交于 11月 03, 2014

Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.

Joint work with Daniel Borkmann.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

274e2da0

igmp: remove camel case definitions · 436f7c20

由 Fabian Frederick 提交于 11月 04, 2014

use standard uppercase for definitions
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

436f7c20

udp: remove else after return · c18450a5

由 Fabian Frederick 提交于 11月 04, 2014

else is unnecessary after return 0 in __udp4_lib_rcv()
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c18450a5

inet: frags: remove inline on static in c file · aa1f731e

由 Fabian Frederick 提交于 11月 04, 2014

remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: N"David S. Miller" <davem@davemloft.net>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa1f731e

ipv4: remove 0/NULL assignment on static · 0d3979b9

由 Fabian Frederick 提交于 11月 04, 2014

static values are automatically initialized to 0
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d3979b9

F
ipv4: use seq_puts instead of seq_printf where possible · c9f503b0
由 Fabian Frederick 提交于 11月 04, 2014
```
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c9f503b0

tcp: spelling s/plugable/pluggable · b92022f3

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b92022f3

cipso: remove NULL assignment on static · 988b1343

由 Fabian Frederick 提交于 11月 04, 2014

Also add blank line after structure declarations
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

988b1343

ipv4: include linux/bug.h instead of asm/bug.h · 4c787b16

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c787b16

cipso: kerneldoc warning fix · 4973404f

由 Fabian Frederick 提交于 11月 04, 2014

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4973404f

31 10月, 2014 7 次提交

netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions · 052b9498

由 Pablo Neira Ayuso 提交于 10月 25, 2014

That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

052b9498

net: skb_fclone_busy() needs to detect orphaned skb · 39bb5e62

由 Eric Dumazet 提交于 10月 30, 2014

Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()

Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 1f3279ae ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39bb5e62

tcp: Correction to RFC number in comment · cd214535

由 Sowmini Varadhan 提交于 10月 30, 2014

Challenge ACK is described in RFC 5961, fix typo.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd214535

gre: Use inner mac length when computing tunnel length · 14051f04

由 Tom Herbert 提交于 10月 30, 2014

Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14051f04

ipv4: Do not cache routing failures due to disabled forwarding. · fa19c2b0

由 Nicolas Cavallari 提交于 10月 30, 2014

If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not.  Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them.  The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src
Signed-off-by: NNicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa19c2b0

syncookies: only increment SYNCOOKIESFAILED on validation error · 646697b9

由 Florian Westphal 提交于 10月 30, 2014

Only count packets that failed cookie-authentication.
We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

646697b9

ipv4: minor spelling fixes · f4e715c3

由 stephen hemminger 提交于 10月 29, 2014

Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4e715c3

30 10月, 2014 3 次提交

inet: frags: remove the WARN_ON from inet_evict_bucket · d70127e8

由 Nikolay Aleksandrov 提交于 10月 28, 2014

The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbf ("inet: frag: move eviction of queues to work queue")
Reported-by: NPatrick McLean <chutzpah@gentoo.org>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d70127e8

inet: frags: fix a race between inet_evict_bucket and inet_frag_kill · 65ba1f1e

由 Nikolay Aleksandrov 提交于 10月 28, 2014

When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
 - then the timer ref is removed, but inet_evict_bucket will not add
   this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
 - then the timer ref is kept until the evictor "expires" the queue, but
   inet_frag_kill will remove the initial ref and will set
   INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
   its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbf ("inet: frag: move eviction of queues to work queue")
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NPatrick McLean <chutzpah@gentoo.org>
Tested-by: NPatrick McLean <chutzpah@gentoo.org>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65ba1f1e

tcp: allow for bigger reordering level · dca145ff

由 Eric Dumazet 提交于 10月 27, 2014

While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
 deep packet inspections, alternative TCP congestion modules...
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dca145ff