提交 · 620f3186caa8124e0efaf329751cf51c5d55c731 · openeuler / raspberrypi-kernel

30 8月, 2013 7 次提交

net: remove search_list from netdev_adjacent · 620f3186

由 Veaceslav Falico 提交于 8月 28, 2013

We already don't need it cause we see every upper/lower device in the list
already.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

620f3186

net: add lower_dev_list to net_device and make a full mesh · 5d261913

由 Veaceslav Falico 提交于 8月 28, 2013

This patch adds lower_dev_list list_head to net_device, which is the same
as upper_dev_list, only for lower devices, and begins to use it in the same
way as the upper list.

It also changes the way the whole adjacent device lists work - now they
contain *all* of upper/lower devices, not only the first level. The first
level devices are distinguished by the bool neighbour field in
netdev_adjacent, also added by this patch.

There are cases when a device can be added several times to the adjacent
list, the simplest would be:

     /---- eth0.10 ---\
eth0-		       --- bond0
     \---- eth0.20 ---/

where both bond0 and eth0 'see' each other in the adjacent lists two times.
To avoid duplication of netdev_adjacent structures ref_nr is being kept as
the number of times the device was added to the list.

The 'full view' is achieved by adding, on link creation, all of the
upper_dev's upper_dev_list devices as upper devices to all of the
lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
versa. On unlink they are removed using the same logic.

I've tested it with thousands vlans/bonds/bridges, everything works ok and
no observable lags even on a huge number of interfaces.

Memory footprint for 128 devices interconnected with each other via both
upper and lower (which is impossible, but for the comparison) lists would be:

128*128*2*sizeof(netdev_adjacent) = 1.5MB

but in the real world we usualy have at most several devices with slaves
and a lot of vlans, so the footprint will be much lower.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d261913

net: rename netdev_upper to netdev_adjacent · aa9d8560

由 Veaceslav Falico 提交于 8月 28, 2013

Rename the structure to reflect the upcoming addition of lower_dev_list.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa9d8560

net: sctp: sctp_verify_init: clean up mandatory checks and add comment · 7613f5fe

由 Daniel Borkmann 提交于 8月 27, 2013

Add a comment related to RFC4960 explaning why we do not check for initial
TSN, and while at it, remove yoda notation checks and clean up code from
checks of mandatory conditions. That's probably just really minor, but makes
reviewing easier.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7613f5fe

tcp: TSO packets automatic sizing · 95bd09eb

由 Eric Dumazet 提交于 8月 27, 2013

After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95bd09eb

ipv6: drop fragmented ndisc packets by default (RFC 6980) · b800c3b9

由 Hannes Frederic Sowa 提交于 8月 27, 2013

This patch implements RFC6980: Drop fragmented ndisc packets by
default. If a fragmented ndisc packet is received the user is informed
that it is possible to disable the check.

Cc: Fernando Gont <fernando@gont.com.ar>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b800c3b9

bridge: inherit slave devices needed_headroom · fd094808

由 Florian Fainelli 提交于 8月 27, 2013

Some slave devices may have set a dev->needed_headroom value which is
different than the default one, most likely in order to prepend a
hardware descriptor in front of the Ethernet frame to send. Whenever a
new slave is added to a bridge, ensure that we update the
needed_headroom value accordingly to account for the slave
needed_headroom value.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd094808

28 8月, 2013 14 次提交

batman-adv: send GW_DEL event when the gw client mode is deselected · c6eaa3f0

由 Antonio Quartulli 提交于 7月 13, 2013

Whenever the GW client mode is deselected, a DEL event has
to be sent in order to tell userspace that the current
gateway has been lost. Send the uevent on state change only
if a gateway was currently selected.
Reported-by: NMarek Lindner <lindner_marek@yahoo.de>
Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
Signed-off-by: NMarek Lindner <lindner_marek@yahoo.de>

c6eaa3f0

batman-adv: Start new development cycle · c00a072d

由 Simon Wunderlich 提交于 7月 21, 2013

Signed-off-by: NSimon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: NAntonio Quartulli <ordex@autistici.org>

c00a072d

A
batman-adv: move enum definition at the top of the file · 791c2a2d
由 Antonio Quartulli 提交于 8月 17, 2013
```
Signed-off-by: NAntonio Quartulli <ordex@autistici.org>
```
791c2a2d

batman-adv: set skb priority according to content · c54f38c9

由 Simon Wunderlich 提交于 7月 29, 2013

The skb priority field may help the wireless driver to choose the right
queue (e.g. WMM queues). This should be set in batman-adv, as this
information is only available here.

This patch adds support for IPv4/IPv6 DS fields and VLAN PCP. Note that
only VLAN PCP is used if a VLAN header is present. Also initially set
TC_PRIO_CONTROL only for self-generated packets, and keep the priority
set by higher layers.
Signed-off-by: NSimon Wunderlich <simon@open-mesh.com>
Signed-off-by: NMarek Lindner <lindner_marek@yahoo.de>
Signed-off-by: NAntonio Quartulli <ordex@autistici.org>

c54f38c9

netfilter: ctnetlink: fix uninitialized variable · b7e092c0

由 Florian Westphal 提交于 8月 27, 2013

net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect':
'helper' may be used uninitialized in this function

It was only initialized in if CTA_EXPECT_HELP_NAME attribute was
present, it must be NULL otherwise.

Problem added recently in bd077937
(netfilter: nfnetlink_queue: allow to attach expectations to conntracks).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b7e092c0

netfilter: add IPv6 SYNPROXY target · 4ad36228

由 Patrick McHardy 提交于 8月 27, 2013

Add an IPv6 version of the SYNPROXY target. The main differences to the
IPv4 version is routing and IP header construction.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Tested-by: NMartin Topholm <mph@one.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4ad36228

net: syncookies: export cookie_v6_init_sequence/cookie_v6_check · 81eb6a14

由 Patrick McHardy 提交于 8月 27, 2013

Extract the local TCP stack independant parts of tcp_v6_init_sequence()
and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY
target.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NMartin Topholm <mph@one.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

81eb6a14

netfilter: add SYNPROXY core/target · 48b1de4c

由 Patrick McHardy 提交于 8月 27, 2013

Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Tested-by: NMartin Topholm <mph@one.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

48b1de4c

net: syncookies: export cookie_v4_init_sequence/cookie_v4_check · 0198230b

由 Patrick McHardy 提交于 8月 27, 2013

Extract the local TCP stack independant parts of tcp_v4_init_sequence()
and cookie_v4_check() and export them for use by the upcoming SYNPROXY
target.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NMartin Topholm <mph@one.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0198230b

netfilter: nf_conntrack: make sequence number adjustments usuable without NAT · 41d73ec0

由 Patrick McHardy 提交于 8月 27, 2013

Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Tested-by: NMartin Topholm <mph@one.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

41d73ec0

netfilter: nf_defrag_ipv6.o included twice · 706f5151

由 Nathan Hintz 提交于 8月 22, 2013

'nf_defrag_ipv6' is built as a separate module; it shouldn't be
included in the 'nf_conntrack_ipv6' module as well.
Signed-off-by: NNathan Hintz <nlhintz@hotmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

706f5151

netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged · affe759d

由 Phil Oester 提交于 6月 26, 2013

As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT
with the tcp-reset option sends out reset packets with the src MAC address
of the local bridge interface, instead of the MAC address of the intended
destination. This causes some routers/firewalls to drop the reset packet
as it appears to be spoofed. Fix this by bypassing ip[6]_local_out and
setting the MAC of the sender in the tcp reset packet.

This closes netfilter bugzilla #531.
Signed-off-by: NPhil Oester <kernel@linuxace.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

affe759d

openvswitch: optimize flow compare and mask functions · 5828cd9a

由 Andy Zhou 提交于 8月 27, 2013

Make sure the sw_flow_key structure and valid mask boundaries are always
machine word aligned. Optimize the flow compare and mask operations
using machine word size operations. This patch improves throughput on
average by 15% when CPU is the bottleneck of forwarding packets.

This patch is inspired by ideas and code from a patch submitted by Peter
Klausler titled "replace memcmp() with specialized comparator".
However, The original patch only optimizes for architectures
support unaligned machine word access. This patch optimizes for all
architectures.
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

5828cd9a

net: tcp_probe: allow more advanced ingress filtering by mark · b1dcdc68

由 Daniel Borkmann 提交于 8月 23, 2013

Currently, the tcp_probe snooper can either filter packets by a given
port (handed to the module via module parameter e.g. port=80) or lets
all TCP traffic pass (port=0, default). When a port is specified, the
port number is tested against the sk's source/destination port. Thus,
if one of them matches, the information will be further processed for
the log.

As this is quite limited, allow for more advanced filtering possibilities
which can facilitate debugging/analysis with the help of the tcp_probe
snooper. Therefore, similarly as added to BPF machine in commit 7e75f93e
("pkt_sched: ingress socket filter by mark"), add the possibility to
use skb->mark as a filter.

If the mark is not being used otherwise, this allows ingress filtering
by flow (e.g. in order to track updates from only a single flow, or a
subset of all flows for a given port) and other things such as dynamic
logging and reconfiguration without removing/re-inserting the tcp_probe
module, etc. Simple example:

  insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1
  ...
  iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \
           --sport 60952 -j MARK --set-mark 8888
  [... sampling interval ...]
  iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \
           --sport 60952 -j MARK --set-mark 8888

The current option to filter by a given port is still being preserved. A
similar approach could be done for the sctp_probe module as a follow-up.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1dcdc68

27 8月, 2013 2 次提交

openvswitch: Rename key_len to key_end · 02237373

由 Andy Zhou 提交于 8月 22, 2013

Key_end is a better name describing the ending boundary than key_len.
Rename those variables to make it less confusing.
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

02237373

openvswitch: Add SCTP support · a175a723

由 Joe Stringer 提交于 8月 22, 2013

This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.

Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.
Reviewed-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NJoe Stringer <joe@wand.net.nz>
Signed-off-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

a175a723

26 8月, 2013 1 次提交

ipip: potential race in ip_tunnel_init_net() · b4de77ad

由 Dan Carpenter 提交于 8月 23, 2013

Eric Dumazet says that my previous fix for an ERR_PTR dereference
(ea857f28 'ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()')
could be racy and suggests the following fix instead.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4de77ad

24 8月, 2013 7 次提交

openvswitch: Mega flow implementation · 03f0d916

由 Andy Zhou 提交于 8月 07, 2013

Add wildcarded flow support in kernel datapath.

Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.

In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.

Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NAndy Zhou <azhou@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

03f0d916

openvswitch: check CONFIG_OPENVSWITCH_GRE in makefile · 3fa34de6

由 Cong Wang 提交于 8月 20, 2013

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

3fa34de6

J
openvswitch: Fix argument descriptions in vport.c. · 2694838d
由 Justin Pettit 提交于 8月 19, 2013
```
Signed-off-by: NJustin Pettit <jpettit@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
```
2694838d

openvswitch:: link upper device for port devices · 2537b4dd

由 Jiri Pirko 提交于 7月 26, 2013

Link upper device properly. That will make IFLA_MASTER filled up.
Set the master to port 0 of the datapath under which the port belongs.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NJesse Gross <jesse@nicira.com>

2537b4dd

openvswitch: Use non rcu hlist_del() flow table entry. · 76a66c7e

由 Pravin B Shelar 提交于 7月 30, 2013

Flow table destroy is done in rcu call-back context.  Therefore
there is no need to use rcu variant of hlist_del().
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

76a66c7e

openvswitch: Use RCU lock for dp dump operation. · 59a35d60

由 Pravin B Shelar 提交于 7月 30, 2013

RCUfy dp-dump operation which is already read-only. This
makes all ovs dump operations lockless.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

59a35d60

openvswitch: Use RCU lock for flow dump operation. · d57170b1

由 Pravin B Shelar 提交于 7月 30, 2013

Flow dump operation is read-only operation.  There is no need to
take ovs-lock.  Following patch use rcu-lock for dumping flows.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

d57170b1

23 8月, 2013 7 次提交

net: sctp_probe: simplify code by using %pISc format specifier · 05f147ef

由 Daniel Borkmann 提交于 8月 22, 2013

We can simply use the %pISc format specifier that was recently added
and thus remove some code that distinguishes between IPv4 and IPv6.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05f147ef

ipv6: handle Redirect ICMP Message with no Redirected Header option · c92a59ec

由 Duan Jiong 提交于 8月 22, 2013

rfc 4861 says the Redirected Header option is optional, so
the kernel should not drop the Redirect Message that has no
Redirected Header option. In this patch, the function
ip6_redirect_no_header() is introduced to deal with that
condition.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>

c92a59ec

net: tcp_probe: add IPv6 support · f925d0a6

由 Daniel Borkmann 提交于 8月 21, 2013

The tcp_probe currently only supports analysis of IPv4 connections.
Therefore, it would be nice to have IPv6 supported as well. Since we
have the recently added %pISpc specifier that is IPv4/IPv6 generic,
build related sockaddress structures from the flow information and
pass this to our format string. Tested with SSH and HTTP sessions
on IPv4 and IPv6.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f925d0a6

net: tcp_probe: kprobes: adapt jtcp_rcv_established signature · d8cdeda6

由 Daniel Borkmann 提交于 8月 21, 2013

This patches fixes a rather unproblematic function signature mismatch
as the const specifier was missing for the th variable; and next to
that it adds a build-time assertion so that future function signature
mismatches for kprobes will not end badly, similarly as commit 22222997
("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack")
did it for SCTP.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8cdeda6

net: tcp_probe: also include rcv_wnd next to snd_wnd · b4c1c1d0

由 Daniel Borkmann 提交于 8月 21, 2013

It is helpful to sometimes know the TCP window sizes of an established
socket e.g. to confirm that window scaling is working or to tweak the
window size to improve high-latency connections, etc etc. Currently the
TCP snooper only exports the send window size, but not the receive window
size. Therefore, also add the receive window size to the end of the
output line.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4c1c1d0

tcp: increase throughput when reordering is high · 0f7cc9a3

由 Yuchung Cheng 提交于 8月 21, 2013

The stack currently detects reordering and avoid spurious
retransmission very well. However the throughput is sub-optimal under
high reordering because cwnd is increased only if the data is deliverd
in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
are reordered the worse the throughput is.

Therefore when reordering is proven high, cwnd should advance whenever
the data is delivered regardless of its ordering. If reordering is low,
conservatively advance cwnd only on ordered deliveries in Open state,
and retain cwnd in Disordered state (RFC5681).

Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
to 55ms (for reordering effect). This change increases TCP throughput
by 20 - 25% to near bottleneck BW.

A special case is the stretched ACK with new SACK and/or ECE mark.
For example, a receiver may receive an out of order or ECN packet with
unacked data buffered because of LRO or delayed ACK. The principle on
such an ACK is to advance cwnd on the cummulative acked part first,
then reduce cwnd in tcp_fastretrans_alert().
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f7cc9a3

Revert "genetlink: fix family dump race" · 9d47b380

由 Johannes Berg 提交于 8月 21, 2013

This reverts commit 58ad436f.

It turns out that the change introduced a potential deadlock
by causing a locking dependency with netlink's cb_mutex. I
can't seem to find a way to resolve this without doing major
changes to the locking, so revert this.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d47b380

21 8月, 2013 2 次提交

net: ipv6: mcast: minor: use defines for rfc3810/8.1 lengths · 9fd07841

由 Daniel Borkmann 提交于 8月 20, 2013

Instead of hard-coding length values, use a define to make it clear
where those lengths come from.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fd07841

net: ipv6: minor: *_start_timer: rather use unsigned long · c2cef4e8

由 Daniel Borkmann 提交于 8月 20, 2013

For the functions mld_gq_start_timer(), mld_ifc_start_timer(),
and mld_dad_start_timer(), rather use unsigned long than int
as we operate only on unsigned values anyway. This seems more
appropriate as there is no good reason to do type conversions
to int, that could lead to future errors.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2cef4e8