提交 · bc4d0f615a3aa252d2ed59c1be72ad88b86db0e9 · openeuler / raspberrypi-kernel

18 12月, 2013 1 次提交

lib: introduce arch optimized hash library · 71ae8aac

由 Francesco Fusco 提交于 12月 12, 2013

We introduce a new hashing library that is meant to be used in
the contexts where speed is more important than uniformity of the
hashed values. The hash library leverages architecture specific
implementation to achieve high performance and fall backs to
jhash() for the generic case.

On Intel-based x86 architectures, the library can exploit the crc32l
instruction, part of the Intel SSE4.2 instruction set, if the
instruction is supported by the processor. This implementation
is twice as fast as the jhash() implementation on an i7 processor.

Additional architectures, such as Arm64 provide instructions for
accelerating the computation of CRC, so they could be added as well
in follow-up work.
Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NThomas Graf <tgraf@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71ae8aac

14 12月, 2013 10 次提交

bonding: create bond_first_slave_rcu() · e001bfad

由 dingtianhong 提交于 12月 13, 2013

The bond_first_slave_rcu() will be used to instead of bond_first_slave()
in rcu_read_lock().

According to the Jay Vosburgh's suggestion, the struct netdev_adjacent
should hide from users who wanted to use it directly. so I package a
new function to get the first slave of the bond.
Suggested-by: NNikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e001bfad

bonding: add arp_all_targets netlink support · d5c84254

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_ARP_ALL_TARGETS to allow get/set of bonding parameter
arp_all_targets via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5c84254

bonding: add arp_validate netlink support · 29c49482

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_ARP_VALIDATE to allow get/set of bonding parameter
arp_validate via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29c49482

bonding: add arp_ip_target netlink support · 7f28fa10

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_ARP_IP_TARGET to allow get/set of bonding parameter
arp_ip_target via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f28fa10

bonding: add arp_interval netlink support · 06151dbc

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_ARP_INTERVAL to allow get/set of bonding parameter
arp_interval via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06151dbc

bonding: add use_carrier netlink support · 9f53e14e

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_USE_CARRIER to allow get/set of bonding parameter
use_carrier via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f53e14e

bonding: add downdelay netlink support · c7461f9b

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_DOWNDELAY to allow get/set of bonding parameter
downdelay via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7461f9b

bonding: add updelay netlink support · 25852e29

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_UPDELAY to allow get/set of bonding parameter
updelay via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25852e29

bonding: add miimon netlink support · eecdaa6e

由 sfeldma@cumulusnetworks.com 提交于 12月 12, 2013

Add IFLA_BOND_MIIMON to allow get/set of bonding parameter
miimon via netlink.
Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eecdaa6e

netconf: add proxy-arp support · f085ff1c

由 stephen hemminger 提交于 12月 12, 2013

Add support to netconf to show changes to proxy-arp status on a per
interface basis via netlink in a manner similar to forwarding
and reverse path state.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f085ff1c

13 12月, 2013 2 次提交

net-gro: Prepare GRO stack for the upcoming tunneling support · 299603e8

由 Jerry Chu 提交于 12月 11, 2013

This patch modifies the GRO stack to avoid the use of "network_header"
and associated macros like ip_hdr() and ipv6_hdr() in order to allow
an arbitary number of IP hdrs (v4 or v6) to be used in the
encapsulation chain. This lays the foundation for various IP
tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.

With this patch, the GRO stack traversing now is mostly based on
skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
skb->network_header). As a result all but the top layer (i.e., the
the transport layer) must have hdrs of the same length in order for
a pkt to be considered for aggregation. Therefore when adding a new
encap layer (e.g., for tunneling), one must check and skip flows
(e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
different hdr length.

Note that unlike the network header, the transport header can and
will continue to be set by the GRO code since there will be at
most one "transport layer" in the encap chain.
Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
Suggested-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

299603e8

macvlan: Remove custom recieve and forward handlers · 2f6a1b66

由 Vlad Yasevich 提交于 12月 11, 2013

Since now macvlan and macvtap use the same receive and
forward handlers, we can remove them completely and use
netif_rx and dev_forward_skb() directly.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f6a1b66

12 12月, 2013 2 次提交

ipv6: router reachability probing · 7e980569

由 Jiri Benc 提交于 12月 11, 2013

RFC 4191 states in 3.5:

   When a host avoids using any non-reachable router X and instead sends
   a data packet to another router Y, and the host would have used
   router X if router X were reachable, then the host SHOULD probe each
   such router X's reachability by sending a single Neighbor
   Solicitation to that router's address.  A host MUST NOT probe a
   router's reachability in the absence of useful traffic that the host
   would have sent to the router if it were reachable.  In any case,
   these probes MUST be rate-limited to no more than one per minute per
   router.

Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e980569

ipv4: fix wildcard search with inet_confirm_addr() · b601fa19

由 Nicolas Dichtel 提交于 12月 10, 2013

Help of this function says: "in_dev: only on this interface, 0=any interface",
but since commit 39a6d063 ("[NETNS]: Process inet_confirm_addr in the
correct namespace."), the code supposes that it will never be NULL. This
function is never called with in_dev == NULL, but it's exported and may be used
by an external module.

Because this patch restore the ability to call inet_confirm_addr() with in_dev
== NULL, I partially revert the above commit, as suggested by Julian.

CC: Julian Anastasov <ja@ssi.bg>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b601fa19

11 12月, 2013 4 次提交

tipc: remove TIPC usage of field af_packet_priv in struct net_device · 37cb0620

由 Ying Xue 提交于 12月 10, 2013

TIPC is currently using the field 'af_packet_priv' in struct net_device
as a handle to find the bearer instance associated to the given network
device. But, by doing so it is blocking other networking cleanups, such
as the one discussed here:

http://patchwork.ozlabs.org/patch/178044/

This commit removes this usage from TIPC. Instead, we introduce a new
field, 'tipc_ptr', to the net_device structure, to serve this purpose.
When TIPC bearer is enabled, the bearer object is associated to
'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall
from a networking device, the bearer object can now be obtained from
'tipc_ptr'. When a bearer is disabled, the bearer object is detached
from its underlying network device by setting 'tipc_ptr' to NULL.

Additionally, an RCU lock is used to protect the new pointer.
Henceforth, the existing tipc_net_lock is used in write mode to
serialize write accesses to this pointer, while the new RCU lock is
applied on the read side to ensure that the pointer is 100% valid
within its wrapped area for all readers.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37cb0620

ipv4: add support for IFA_FLAGS nl attribute · ad6c8135

由 Jiri Pirko 提交于 12月 08, 2013

Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad6c8135

dn_dev: add support for IFA_FLAGS nl attribute · 9a32b860

由 Jiri Pirko 提交于 12月 08, 2013

Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a32b860

cipso: cleanup cipso_v4_translate() when !CONFIG_NETLABEL · 10ae76fa

由 Paul Moore 提交于 12月 10, 2013

Don't needlessly recompute 'opt[opt_iter + 1]' as we already have it
stored in 'tag_len'.
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10ae76fa

10 12月, 2013 15 次提交

ipv6: add ip6_flowlabel helper · 3308de2b

由 Florent Fourcot 提交于 12月 08, 2013

And use it if possible.
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3308de2b

ipv6: remove rcv_tclass of ipv6_pinfo · 82e9f105

由 Florent Fourcot 提交于 12月 08, 2013

tclass information in now already stored in rcv_flowinfo
We do not need to store the same information twice.
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82e9f105

ipv6: move IPV6_TCLASS_MASK definition in ipv6.h · 37cfee90

由 Florent Fourcot 提交于 12月 08, 2013

Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37cfee90

ipv6: add flowinfo for tcp6 pkt_options for all cases · 1397ed35

由 Florent Fourcot 提交于 12月 08, 2013

The current implementation of IPV6_FLOWINFO only gives a
result if pktoptions is available (thanks to the
ip6_datagram_recv_ctl function).
It gives inconsistent results to user space, sometimes
there is a result for getsockopt(IPV6_FLOWINFO), sometimes
not.

This patch add rcv_flowinfo to store it, and return it to
the userspace in the same way than other pkt_options.
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1397ed35

etherdevice: Optimize a few is_<foo>_ether_addr functions · 2c722fe1

由 Joe Perches 提交于 12月 06, 2013

If CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set,
several is_<foo>_ether_addr functions can be slightly
improved by using u32 dereferences.

I believe all current uses of is_zero_ether_addr and
is_broadcast_ether_addr are u16 aligned, so always use
u16 references to improve those functions performance.

Document the u16 alignment requirements.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c722fe1

etherdevice: Add ether_addr_equal_unaligned · 73eaef87

由 Joe Perches 提交于 12月 06, 2013

Add a generic routine to test if possibly unaligned
to u16 Ethernet addresses are equal.

If CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set,
this uses the slightly faster generic routine
ether_addr_equal, otherwise this uses memcmp.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73eaef87

neigh: ipv6: respect default values set before an address is assigned to device · bba24896

由 Jiri Pirko 提交于 12月 07, 2013

Make the behaviour similar to ipv4. This will allow user to set sysctl
default neigh param values and these values will be respected even by
devices registered before (that ones what do not have address set yet).
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bba24896

neigh: restore old behaviour of default parms values · 1d4c8c29

由 Jiri Pirko 提交于 12月 07, 2013

Previously inet devices were only constructed when addresses are added.
Therefore the default neigh parms values they get are the ones at the
time of these operations.

Now that we're creating inet devices earlier, this changes the behaviour
of default neigh parms values in an incompatible way (see bug #8519).

This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.

Introduced by:
commit 8030f544
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 22 01:53:47 2007 +0900

    [IPV4] devinet: Register inetdev earlier.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d4c8c29

J
neigh: use tbl->family to distinguish ipv4 from ipv6 · 73af614a
由 Jiri Pirko 提交于 12月 07, 2013
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
73af614a

neigh: wrap proc dointvec functions · cb5b09c1

由 Jiri Pirko 提交于 12月 07, 2013

This will be needed later on to provide better management of default values.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb5b09c1

neigh: convert parms to an array · 1f9248e5

由 Jiri Pirko 提交于 12月 07, 2013

This patch converts the neigh param members to an array. This allows easier
manipulation which will be needed later on to provide better management of
default values.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f9248e5

net: phy: report link partner features through ethtool · 114002bc

由 Florian Fainelli 提交于 12月 06, 2013

The PHY library already reads the MII_STAT1000 and MII_LPA registers in
genphy_read_status(), so extend it to also populate the PHY device link
partner advertised features such that we can feed this back into ethtool
when asked for it in phy_ethtool_gset().
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

114002bc

packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa

由 Daniel Borkmann 提交于 12月 06, 2013

This patch introduces a PACKET_QDISC_BYPASS socket option, that
allows for using a similar xmit() function as in pktgen instead
of taking the dev_queue_xmit() path. This can be very useful when
PF_PACKET applications are required to be used in a similar
scenario as pktgen, but with full, flexible packet payload that
needs to be provided, for example.

On default, nothing changes in behaviour for normal PF_PACKET
TX users, so everything stays as is for applications. New users,
however, can now set PACKET_QDISC_BYPASS if needed to prevent
own packets from i) reentering packet_rcv() and ii) to directly
push the frame to the driver.

In doing so we can increase pps (here 64 byte packets) for
PF_PACKET a bit:

  # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
  1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
  2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
  3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
  4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
  5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
  6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
  7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
  8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
  9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
 10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
 11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
 [...]
 20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081

 [**]: qdisc path with packet_rcv(), how probably most people
       seem to use it (hopefully not anymore if not needed)

The test was done using a modified trafgen, sending a simple
static 64 bytes packet, on all CPUs.  The trick in the fast
"qdisc path" case, is to avoid reentering packet_rcv() by
setting the RAW socket protocol to zero, like:
socket(PF_PACKET, SOCK_RAW, 0);

Tradeoffs are documented as well in this patch, clearly, if
queues are busy, we will drop more packets, tc disciplines are
ignored, and these packets are not visible to taps anymore. For
a pktgen like scenario, we argue that this is acceptable.

The pointer to the xmit function has been placed in packet
socket structure hole between cached_dev and prot_hook that
is hot anyway as we're working on cached_dev in each send path.

Done in joint work together with Jesper Dangaard Brouer.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d346a3fa

net: dev: move inline skb_needs_linearize helper to header · 4262e5cc

由 Daniel Borkmann 提交于 12月 06, 2013

As we need it elsewhere, move the inline helper function of
skb_needs_linearize() over to skbuff.h include file. While
at it, also convert the return to 'bool' instead of 'int'
and add a proper kernel doc.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4262e5cc

pkt_sched: give visibility to mq slave qdiscs · 95dc1929

由 Eric Dumazet 提交于 12月 05, 2013

Commit 6da7c8fc ("qdisc: allow setting default queuing discipline")
added the ability to change default qdisc from pfifo_fast to say fq

But as most modern ethernet devices are multiqueue, we cant really
see all the statistics from "tc -s qdisc show", as the default root
qdisc is mq.

This patch adds the calls to qdisc_list_add() to mq and mqprio
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95dc1929

07 12月, 2013 6 次提交

ether_addr_equal: Optimize implementation, remove unused compare_ether_addr · 0d74c42f

由 Joe Perches 提交于 12月 05, 2013

Add a new check for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to reduce
the number of or's used in the ether_addr_equal comparison to very
slightly improve function performance.

Simplify the ether_addr_equal_64bits implementation.
Integrate and remove the zap_last_2bytes helper as it's now
used only once.

Remove the now unused compare_ether_addr function.

Update the unaligned-memory-access documentation to remove the
compare_ether_addr description and show how unaligned accesses
could occur with ether_addr_equal.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d74c42f

ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses · 53bd6749

由 Jiri Pirko 提交于 12月 06, 2013

Creating an address with this flag set will result in kernel taking care
of temporary addresses in the same way as if the address was created by
kernel itself (after RA receive). This allows userspace applications
implementing the autoconfiguration (NetworkManager for example) to
implement ipv6 addresses privacy.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NThomas Haller <thaller@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53bd6749

ipv6 addrconf: extend ifa_flags to u32 · 479840ff

由 Jiri Pirko 提交于 12月 06, 2013

There is no more space in u8 ifa_flags. So do what davem suffested and
add another netlink attr called IFA_FLAGS for carry more flags.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NThomas Haller <thaller@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

479840ff

net: introduce dev_consume_skb_any() · e6247027

由 Eric Dumazet 提交于 12月 05, 2013

Some network drivers use dev_kfree_skb_any() and dev_kfree_skb_irq()
helpers to free skbs, both for dropped packets and TX completed ones.

We need to separate the two causes to get better diagnostics
given by dropwatch or "perf record -e skb:kfree_skb"

This patch provides two new helpers, dev_consume_skb_any() and
dev_consume_skb_irq() to be used for consumed skbs.

__dev_kfree_skb_irq() is slightly optimized to remove one
atomic_dec_and_test() in fast path, and use this_cpu_{r|w} accessors.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6247027

net: phy: breakdown PHY_*_FEATURES defines · e9fbdf17

由 Florian Fainelli 提交于 12月 05, 2013

Breakdown the PHY_*_FEATURES into per speed defines such that we can
easily re-use them individually.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9fbdf17

tcp: auto corking · f54b3111

由 Eric Dumazet 提交于 12月 05, 2013

With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

35209.439626 task-clock # 2.901 CPUs utilized
2,294 context-switches # 0.065 K/sec
101 CPU-migrations # 0.003 K/sec
4,079 page-faults # 0.116 K/sec
97,923,241,298 cycles # 2.781 GHz [83.31%]
51,832,908,236 stalled-cycles-frontend # 52.93% frontend cycles idle [83.30%]
25,697,986,603 stalled-cycles-backend # 26.24% backend cycles idle [66.70%]
102,225,978,536 instructions # 1.04 insns per cycle
# 0.51 stalled cycles per insn [83.38%]
18,657,696,819 branches # 529.906 M/sec [83.29%]
91,679,646 branch-misses # 0.49% of all branches [83.40%]

12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
40045.864494 task-clock # 3.301 CPUs utilized
171 context-switches # 0.004 K/sec
53 CPU-migrations # 0.001 K/sec
4,080 page-faults # 0.102 K/sec
111,340,458,645 cycles # 2.780 GHz [83.34%]
61,778,039,277 stalled-cycles-frontend # 55.49% frontend cycles idle [83.31%]
29,295,522,759 stalled-cycles-backend # 26.31% backend cycles idle [66.67%]
108,654,349,355 instructions # 0.98 insns per cycle
# 0.57 stalled cycles per insn [83.34%]
19,552,170,748 branches # 488.244 M/sec [83.34%]
157,875,417 branch-misses # 0.81% of all branches [83.34%]

12.130267788 seconds time elapsed
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f54b3111