提交 · d4ac32365dcbfd341a87eae444c26679f889249a · openeuler / raspberrypi-kernel

27 3月, 2013 14 次提交

T
6lowpan: store fragment tag values per device instead of net stack wide · d4ac3236
由 Tony Cheneau 提交于 3月 25, 2013
```
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d4ac3236

6lowpan: add debug messages for 6LoWPAN fragmentation · 9da2924c

由 Tony Cheneau 提交于 3月 25, 2013

Add pr_debug() call in order to debug 6LoWPAN fragmentation and
reassembly.
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9da2924c

6lowpan: fix first fragment (FRAG1) handling · d991b98f

由 Tony Cheneau 提交于 3月 25, 2013

The first fragment, FRAG1, must contain some payload according to the
specs. However, as it is currently written, the first fragment will
remain empty and only contain the 6lowpan headers.

This patch also extracts the transport layer information from the first
fragment. This information is used later on when uncompressing UDP
header.

Thanks to Wolf-Bastian Pöttner for noticing that the offset value was
not properly initialized.
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d991b98f

6lowpan: use short IEEE 802.15.4 addresses for broadcast destination · 58ef67c3

由 Tony Cheneau 提交于 3月 25, 2013

The IEEE 802.15.4 standard uses the 0xFFFF short address (2 bytes) for message
broadcasting.
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58ef67c3

mac802154: turn on ACK when enabled by the upper layers · cf692061

由 Tony Cheneau 提交于 3月 25, 2013

Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf692061

6lowpan: always enable link-layer acknowledgments · f333a15a

由 Tony Cheneau 提交于 3月 25, 2013

This feature is especially important when using fragmentation, because
the reassembly mechanism cannot recover from the loss of a fragment.

Note that some hardware ignore this flag and not will not transmit
acknowledgments even if this is set.
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f333a15a

6lowpan: next header is not properly set upon decompression of a UDP header. · f5c20f58

由 Tony Cheneau 提交于 3月 25, 2013

This causes a drop of the UDP packet.
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5c20f58

6lowpan: lowpan_is_iid_16_bit_compressable() does not detect compressible address correctly · 8d879a3f

由 Tony Cheneau 提交于 3月 25, 2013

The current test is not RFC6282 compliant. The same issue has been found
and fixed in Contiki. This patch is basically a port of their fix.
Signed-off-by: NTony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d879a3f

netlink: have length check of rtnl msg before deref · de179c8c

由 Hong zhi guo 提交于 3月 25, 2013

When the legacy array rtm_min still exists, the length check within
these functions is covered by rtm_min[RTM_NEWTFILTER],
rtm_min[RTM_NEWQDISC] and rtm_min[RTM_NEWTCLASS].

But after Thomas Graf removed rtm_min several days ago, these checks
are missing. Other doit functions should be OK.
Signed-off-by: NHong Zhiguo <honkiko@gmail.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de179c8c

Y
firewire net, ipv6: IPv6 over Firewire (RFC3146) support. · cb6bf355
由 YOSHIFUJI Hideaki / 吉藤英明提交于 3月 25, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
cb6bf355

firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. · 6752c8db

由 YOSHIFUJI Hideaki / 吉藤英明提交于 3月 25, 2013

Inspection of upper layer protocol is considered harmful, especially
if it is about ARP or other stateful upper layer protocol; driver
cannot (and should not) have full state of them.

IPv4 over Firewire module used to inspect ARP (both in sending path
and in receiving path), and record peer's GUID, max packet size, max
speed and fifo address.  This patch removes such inspection by extending
our "hardware address" definition to include other information as well:
max packet size, max speed and fifo.  By doing this, The neighbour
module in networking subsystem can cache them.

Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
      information will not be used in the driver when sending.

When a packet is being sent, the IP layer fills our pseudo header with
the extended "hardware address", including GUID and fifo.  The driver
can look-up node-id (the real but rather volatile low-level address)
by GUID, and then the module can send the packet to the wire using
parameters provided in the extendedn hardware address.

This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
over IEEE1394 (RFC3146) share same "hardware address" format
in their address resolution protocols.

Here, extended "hardware address" is defined as follows:

union fwnet_hwaddr {
	u8 u[16];
	struct {
		__be64 uniq_id;		/* EUI-64			*/
		u8 max_rec;		/* max packet size		*/
		u8 sspd;		/* max speed			*/
		__be16 fifo_hi;		/* hi 16bits of FIFO addr	*/
		__be32 fifo_lo;		/* lo 32bits of FIFO addr	*/
	} __packed uc;
};

Note that Hardware address is declared as union, so that we can map full
IP address into this, when implementing MCAP (Multicast Cannel Allocation
Protocol) for IPv6, but IP and ARP subsystem do not need to know this
format in detail.

One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
is that 1394 ARP Request/Reply do not contain the target hardware address
field (aka ar$tha).  This difference is handled in the ARP subsystem.

CC: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6752c8db

Tunneling: use IP Tunnel stats APIs. · f61dd388

由 Pravin B Shelar 提交于 3月 25, 2013

Use common function get calculate rtnl_link_stats64 stats.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f61dd388

IPIP: Use ip-tunneling code. · fd58156e

由 Pravin B Shelar 提交于 3月 25, 2013

Reuse common ip-tunneling code which is re-factored from GRE
module.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd58156e

GRE: Refactor GRE tunneling code. · c5441932

由 Pravin B Shelar 提交于 3月 25, 2013

Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5441932

26 3月, 2013 6 次提交

D
net: Print functions in /proc/net/ptype without the offset. · eaac5f3d
由 David S. Miller 提交于 3月 25, 2013
```
It's always zero.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
eaac5f3d

ipv4: Fix ip-header identification for gso packets. · 25c7704d

由 Pravin B Shelar 提交于 3月 24, 2013

ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab081
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.
Reported-by: NCong Wang <amwang@redhat.com>
Acked-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>

25c7704d

Revert "udp: increase inner ip header ID during segmentation" · 5594c321

由 Pravin B Shelar 提交于 3月 24, 2013

This reverts commit d6a8c36d.
Next commit makes this commit unnecessary.
Acked-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5594c321

Revert "ip_gre: increase inner ip header ID during segmentation" · 9cb690d1

由 Pravin B Shelar 提交于 3月 24, 2013

This reverts commit 10c0d7ed.
Next commit makes this commit unnecessary.
Acked-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cb690d1

dsa: fix freeing of sparse port allocation · 5f64a7db

由 Florian Fainelli 提交于 3月 25, 2013

If we have defined a sparse port allocation which is non-contiguous and
contains gaps, the code freeing port_names will just stop when it
encouters a first NULL port_names, which is not right, we should iterate
over all possible number of ports (DSA_MAX_PORTS) until we are done.
Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f64a7db

dsa: factor freeing of dsa_platform_data · 21168245

由 Florian Fainelli 提交于 3月 25, 2013

This patch factors the freeing of the struct dsa_platform_data
manipulated by the driver identically in two places to a single
function.
Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21168245

25 3月, 2013 6 次提交

bridge: avoid br_ifinfo_notify when nothing changed · 7b99a993

由 Hong zhi guo 提交于 3月 24, 2013

When neither IFF_BRIDGE nor IFF_BRIDGE_PORT is set,
and afspec == NULL but  protinfo != NULL, we run into
"if (err == 0) br_ifinfo_notify(RTM_NEWLINK, p);" with
random value in ret.

Thanks to Sergei for pointing out the error in commit comments.
Signed-off-by: NHong Zhiguo <honkiko@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b99a993

dsa: add device tree bindings to register DSA switches · 5e95329b

由 Florian Fainelli 提交于 3月 22, 2013

This patch adds support for registering DSA switches using Device Tree
bindings. Note that we support programming the switch routing table even
though no in-tree user seems to require it. I tested this on Armada 370
with a Marvell 88E6172 (not supported by mainline yet).
Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e95329b

ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling · eec2e618

由 Hannes Frederic Sowa 提交于 3月 22, 2013

Hello!

After patch 1 got accepted to net-next I will also send a patch to
netfilter-devel to make the corresponding changes to the netfilter
reassembly logic.

Thanks,

  Hannes

-- >8 --
[PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This patch also ensures that INET_ECN_CE is propagated if one fragment
had the codepoint set.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eec2e618

inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6 · be991971

由 Hannes Frederic Sowa 提交于 3月 22, 2013

This patch just moves some code arround to make the ip4_frag_ecn_table
and IPFRAG_ECN_* constants accessible from the other reassembly engines. I
also renamed ip4_frag_ecn_table to ip_frag_ecn_table.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be991971

ipv6: provide addr and netconf dump consistency info · 63998ac2

由 Nicolas Dichtel 提交于 3月 22, 2013

This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.

Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).
Reported-by: NJunwei Zhang <junwei.zhang@6wind.com>
Reported-by: NHongjun Li <hongjun.li@6wind.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63998ac2

ipv4: provide addr and netconf dump consistency info · 0465277f

由 Nicolas Dichtel 提交于 3月 22, 2013

This patch takes benefit of dev_addr_genid and dev_base_seq to check if a change
occurs during a netlink dump. If a change is detected, the flag NLM_F_DUMP_INTR
is set in the first message after the dump was interrupted.

Note that seq and prev_seq must be reset between each family in rtnl_dump_all()
because they are specific to each family.
Reported-by: NJunwei Zhang <junwei.zhang@6wind.com>
Reported-by: NHongjun Li <hongjun.li@6wind.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0465277f

23 3月, 2013 2 次提交

l2tp: calling the ref() instead of deref() · 1b7c92b9

由 Dan Carpenter 提交于 3月 22, 2013

This is a cut and paste typo.  We call ->ref() a second time instead
of ->deref().
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b7c92b9

decnet: Move rtm_dn_policy to dn_route to make it available if !CONFIG_DECNET_ROUTER · 2fa70df9

由 Thomas Graf 提交于 3月 22, 2013

Otherwise build fails with CONFIG_DECNET && !CONFIG_DECNET_ROUTER
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fa70df9

22 3月, 2013 7 次提交

tcp: preserve ACK clocking in TSO · f4541d60

由 Eric Dumazet 提交于 3月 21, 2013

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4541d60

rtnetlink: Remove passing of attributes into rtnl_doit functions · 661d2967

由 Thomas Graf 提交于 3月 21, 2013

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

661d2967

decnet: Parse netlink attributes on our own · 58d7d8f9

由 Thomas Graf 提交于 3月 21, 2013

decnet is the only subsystem left that is relying on the global
netlink attribute buffer rta_buf. It's horrible design and we
want to get rid of it.

This converts all of decnet to do implicit attribute parsing. It
also gets rid of the error prone struct dn_kern_rta.

Yes, the fib_magic() stuff is not pretty.

It's compiled tested but I need someone with appropriate hardware
to test the patch since I don't have access to it.

Cc: linux-decnet-user@lists.sourceforge.net
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58d7d8f9

udp: increase inner ip header ID during segmentation · d6a8c36d

由 Cong Wang 提交于 3月 22, 2013

Similar to GRE tunnel, UDP tunnel should take care of IP header ID
too.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6a8c36d

ip_gre: increase inner ip header ID during segmentation · 10c0d7ed

由 Cong Wang 提交于 3月 22, 2013

According to the previous discussion [1] on netdev list, DaveM insists
we should increase the IP header ID for each segmented packets.
This patch fixes it.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>

1. http://marc.info/?t=136384172700001&r=1&w=2Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10c0d7ed

netlink: Diag core and basic socket info dumping (v2) · eaaa3139

由 Andrey Vagin 提交于 3月 21, 2013

The netlink_diag can be built as a module, just like it's done in
unix sockets.

The core dumping message carries the basic info about netlink sockets:
family, type and protocol, portis, dst_group, dst_portid, state.

Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.

Netlink sockets cab be filtered by protocols.

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

The file /proc/net/netlink doesn't provide enough information for
dumping netlink sockets. It doesn't provide dst_group, dst_portid,
groups above 32.

v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eaaa3139

net: prepare netlink code for netlink diag · 0f29c768

由 Andrey Vagin 提交于 3月 21, 2013

Move a few declarations in a header.
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f29c768

21 3月, 2013 5 次提交

net: remove redundant ifdef CONFIG_CGROUPS · 4021db9a

由 Zefan Li 提交于 3月 20, 2013

The cgroup code has been surrounded by ifdef CONFIG_NET_CLS_CGROUP
and CONFIG_NETPRIO_CGROUP.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4021db9a

tcp: implement RFC5682 F-RTO · e33099f9

由 Yuchung Cheng 提交于 3月 20, 2013

This patch implements F-RTO (foward RTO recovery):

When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted.  Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.

The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss().  The basic version is not supported
because SACK enhanced version also works for non-SACK connections.

The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.

Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e33099f9

tcp: refactor CA_Loss state processing · ab42d9ee

由 Yuchung Cheng 提交于 3月 20, 2013

Consolidate all of TCP CA_Loss state processing in
tcp_fastretrans_alert() into a new function called tcp_process_loss().
This is to prepare the new F-RTO implementation in the next patch.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab42d9ee

tcp: refactor F-RTO · 9b44190d

由 Yuchung Cheng 提交于 3月 20, 2013

The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b44190d

dynticks: avoid flow_cache_flush() interrupting every core · 8fdc929f

由 Chris Metcalf 提交于 3月 19, 2013

Previously, if you did an "ifconfig down" or similar on one core, and
the kernel had CONFIG_XFRM enabled, every core would be interrupted to
check its percpu flow list for items that could be garbage collected.

With this change, we generate a mask of cores that actually have any
percpu items, and only interrupt those cores. When we are trying to
isolate a set of cpus from interrupts, this is important to do.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fdc929f