提交 · 16a16cd35ee29d9bea54dd60e55d9c1cc685a37d · openeuler / raspberrypi-kernel

05 2月, 2017 6 次提交

net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH · 16a16cd3

由 David Ahern 提交于 2月 02, 2017

If an entire multipath route is deleted using prefix and len (without any
nexthops), send a single RTM_DELROUTE notification with the full route
using RTA_MULTIPATH. This is done by generating the skb before the route
delete when all of the sibling routes are still present but sending it
after the route has been removed from the FIB. The skip_notify flag
is used to tell the lower fib code not to send notifications for the
individual nexthop routes.

If a route is deleted using RTA_MULTIPATH for any nexthops or a single
nexthop entry is deleted, then the nexthops are deleted one at a time with
notifications sent as each hop is deleted. This is necessary given that
IPv6 allows individual hops within a route to be deleted.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16a16cd3

net: ipv6: Change notifications for multipath add to RTA_MULTIPATH · 3b1137fe

由 David Ahern 提交于 2月 02, 2017

Change ip6_route_multipath_add to send one notifciation with the full
route encoded with RTA_MULTIPATH instead of a series of individual routes.
This is done by adding a skip_notify flag to the nl_info struct. The
flag is used to skip sending of the notification in the fib code that
actually inserts the route. Once the full route has been added, a
notification is generated with all nexthops.

ip6_route_multipath_add handles 3 use cases: new routes, route replace,
and route append. The multipath notification generated needs to be
consistent with the order of the nexthops and it should be consistent
with the order in a FIB dump which means the route with the first nexthop
needs to be used as the route reference. For the first 2 cases (new and
replace), a reference to the route used to send the notification is
obtained by saving the first route added. For the append case, the last
route added is used to loop back to its first sibling route which is
the first nexthop in the multipath route.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b1137fe

net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute · beb1afac

由 David Ahern 提交于 2月 02, 2017

IPv6 returns multipath routes as a series of individual routes making
their display and handling by userspace different and more complicated
than IPv4, putting the burden on the user to see that a route is part of
a multipath route and internally creating a multipath route if desired
(e.g., libnl does this as of commit 29b71371e764). This patch addresses
this difference, allowing multipath routes to be returned using the
RTA_MULTIPATH attribute.

The end result is that IPv6 multipath routes can be treated and displayed
in a format similar to IPv4:

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

beb1afac

net: ipv6: Allow shorthand delete of all nexthops in multipath route · 0ae81335

由 David Ahern 提交于 2月 02, 2017

IPv4 allows multipath routes to be deleted using just the prefix and
length. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
        nexthop via 10.100.1.254  dev eth1 weight 1
        nexthop via 10.11.200.2  dev eth11.200 weight 1
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

    $ ip ro del 1.1.1.0/24 vrf red

    $ ip ro ls vrf red
    unreachable default metric 8192
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

The same notation does not work with IPv6 because of how multipath routes
are implemented for IPv6. For IPv6 only the first nexthop of a multipath
route is deleted if the request contains only a prefix and length. This
leads to unnecessary complexity in userspace dealing with IPv6 multipath
routes.

This patch allows all nexthops to be deleted without specifying each one
in the delete request. Internally, this is done by walking the sibling
list of the route matching the specifications given (prefix, length,
metric, protocol, etc).

    $  ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024  pref medium
    2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024  pref medium
    ...

    $ ip -6 ro del vrf red 2001:db8:200::/120

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    ...

Because IPv6 allows individual nexthops to be deleted without deleting
the entire route, the ip6_route_multipath_del and non-multipath code
path (ip6_route_del) have to be discriminated so that all nexthops are
only deleted for the latter case. This is done by making the existing
fc_type in fib6_config a u16 and then adding a new u16 field with
fc_delete_all_nh as the first bit.
Suggested-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ae81335

virtio_net: exploit napi_complete_done() return value · 4d6308aa

由 Eric Dumazet 提交于 2月 04, 2017

Since commit 364b6055 ("net: busy-poll: return busypolling status to
drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.

This patch changes virtio_net to use this boolean to avoid a bit of
overhead for busy-poll users.

Jason reports about 1.1% improvement for 1 byte TCP_RR (burst 100).
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d6308aa

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · a076d1bd

由 David S. Miller 提交于 2月 04, 2017

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-02-03

This series contains updates to i40e/i40evf only.

Jake fixes up the driver to not call i40e_vsi_kill_vlan() or
i40e_vsi_add_vlan() when the PVID is set or when the VID is less than 1.
Cleaned up a check which really is not needed since there is no real
reason why we cannot just call i40e_del_mac_all_vlan() directly.  Renamed
functions to better reflect their actual purpose and how they function
in a more clear manner.

Bimmy cleans up unused/deprecated macros.

Mitch cleans up unused device ids which were intended for use when
running Linux VF drivers under Hyper-V, but found to be not needed.
Then cleaned up a function that is no longer needed since the client
open and close functions were refactored.  Adds a sleep without timeout
until the reply from the PF driver has been received since the iWARP
client cannot continue until the operation has been completed.

Tushar Dave fixes an issue seen on SPARC where the use of the 'packed'
directive was causing kernel unaligned errors.

Alex does a refactor to pull some data off of the stack and store it
in the transmit buffer info section of the transmit ring.

Alan fixes a bug which was caused by passing a bad register value to the
firmware, by refactoring the macro INTRL_USEC_TO_REG into a static
inline function.  Also added feedback to the user as to the actual
interrupt rate limit being used when it differs from the requested limit.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a076d1bd

04 2月, 2017 34 次提交

net: skb_needs_check() accepts CHECKSUM_NONE for tx · 6e7bc478

由 Eric Dumazet 提交于 2月 03, 2017

My recent change missed fact that UFO would perform a complete
UDP checksum before segmenting in frags.

In this case skb->ip_summed is set to CHECKSUM_NONE.

We need to add this valid case to skb_needs_check()

Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e7bc478

net: remove support for per driver ndo_busy_poll() · 79e7fff4

由 Eric Dumazet 提交于 2月 02, 2017

We added generic support for busy polling in NAPI layer in linux-4.5

No network driver uses ndo_busy_poll() anymore, we can get rid
of the pointer in struct net_device_ops, and its use in sk_busy_loop()

Saves NETIF_F_BUSY_POLL features bit.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79e7fff4

D
enic: Remove local ndo_busy_poll() implementation. · 7a655c63
由 David S. Miller 提交于 2月 03, 2017
```
We do polling generically these days.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
7a655c63

ixgbevf: get rid of custom busy polling code · 508aac6d

由 Eric Dumazet 提交于 2月 02, 2017

In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.

Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

508aac6d

ixgbe: get rid of custom busy polling code · 3ffc1af5

由 Eric Dumazet 提交于 2月 02, 2017

In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.

Not only we remove lot's of code, we also remove one lock
operation in fast path, and allow GRO to do its job.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ffc1af5

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 52e01b84

由 David S. Miller 提交于 2月 03, 2017

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
   sk_buff so we only access one single cacheline in the conntrack
   hotpath. Patchset from Florian Westphal.

2) Don't leak pointer to internal structures when exporting x_tables
   ruleset back to userspace, from Willem DeBruijn. This includes new
   helper functions to copy data to userspace such as xt_data_to_user()
   as well as conversions of our ip_tables, ip6_tables and arp_tables
   clients to use it. Not surprinsingly, ebtables requires an ad-hoc
   update. There is also a new field in x_tables extensions to indicate
   the amount of bytes that we copy to userspace.

3) Add nf_log_all_netns sysctl: This new knob allows you to enable
   logging via nf_log infrastructure for all existing netnamespaces.
   Given the effort to provide pernet syslog has been discontinued,
   let's provide a way to restore logging using netfilter kernel logging
   facilities in trusted environments. Patch from Michal Kubecek.

4) Validate SCTP checksum from conntrack helper, from Davide Caratti.

5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
   a copy&paste from the original helper, from Florian Westphal.

6) Reset netfilter state when duplicating packets, also from Florian.

7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
   nft_meta, from Liping Zhang.

8) Add missing code to deal with loopback packets from nft_meta when
   used by the netdev family, also from Liping.

9) Several cleanups on nf_tables, one to remove unnecessary check from
   the netlink control plane path to add table, set and stateful objects
   and code consolidation when unregister chain hooks, from Gao Feng.

10) Fix harmless reference counter underflow in IPVS that, however,
    results in problems with the introduction of the new refcount_t
    type, from David Windsor.

11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
    from Davide Caratti.

12) Missing documentation on nf_tables uapi header, from Liping Zhang.

13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52e01b84

Merge branch 'mlxsw-Introduce-TC-Flower-offload-using-TCAM' · e60df624

由 David S. Miller 提交于 2月 03, 2017

Jiri Pirko says:

====================
mlxsw: Introduce TC Flower offload using TCAM

This patchset introduces support for offloading TC cls_flower and actions
to Spectrum TCAM-base policy engine.

The patchset contains patches to allow work with flexible keys and actions
which are used in Spectrum TCAM.

It also contains in-driver infrastructure for offloading TC rules to TCAM HW.
The TCAM management code is simple and limited for now. It is going to be
extended as a follow-up work.

The last patch uses the previously introduced infra to allow to implement
cls_flower offloading. Initially, only limited set of match-keys and only
a drop and forward actions are supported.

As a dependency, this patchset introduces parman - priority array
area manager - as a library.

v1->v2:
- patch11:
  - use __set_bit and __test_and_clear_bit as suggested by DaveM
- patch16:
  - Added documentation to the API functions as suggested by Tom Herbert
- patch17:
  - use __set_bit and __clear_bit as suggested by DaveM
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e60df624

mlxsw: spectrum: Implement TC flower offload · 7aa0f5aa

由 Jiri Pirko 提交于 2月 03, 2017

Extend the existing setup_tc ndo call and allow to offload cls_flower
rules. Only limited set of dissector keys and actions are supported now.
Use previously introduced ACL infrastructure to offload cls_flower rules
to be processed in the HW.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7aa0f5aa

sched: cls_flower: expose priority to offloading netdevice · 69ca05ce

由 Jiri Pirko 提交于 2月 03, 2017

The driver that offloads flower rules needs to know with which priority
user inserted the rules. So add this information into offload struct.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69ca05ce

mlxsw: spectrum: Introduce ACL core with simple TCAM implementation · 22a67766

由 Jiri Pirko 提交于 2月 03, 2017

Add ACL core infrastructure for Spectrum ASIC. This infra provides an
abstraction layer over specific HW implementations. There are two basic
objects used. One is "rule" and the second is "ruleset" which serves as a
container of multiple rules. In general, within one ruleset the rules are
allowed to have multiple priorities and masks. Each ruleset is bound to
either ingress or egress a of port netdevice.

The initial TCAM implementation is very simple and limited. It utilizes
parman lsort manager to take care of TCAM region layout.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22a67766

lib: Introduce priority array area manager · 44091d29

由 Jiri Pirko 提交于 2月 03, 2017

This introduces a infrastructure for management of linear priority
areas. Priority order in an array matters, however order of items inside
a priority group does not matter.

As an initial implementation, L-sort algorithm is used. It is quite
trivial. More advanced algorithm called P-sort will be introduced as a
follow-up. The infrastructure is prepared for other algos.

Alongside this, a testing module is introduced as well.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44091d29

list: introduce list_for_each_entry_from_reverse helper · b862815c

由 Jiri Pirko 提交于 2月 03, 2017

Similar to list_for_each_entry_continue and its reverse variant
list_for_each_entry_continue_reverse, introduce reverse helper for
list_for_each_entry_from.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b862815c

mlxsw: resources: Add ACL related resources · 8708ecf0

由 Jiri Pirko 提交于 2月 03, 2017

Add couple of resource limits related to ACL.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8708ecf0

mlxsw: spectrum: Introduce basic set of flexible key blocks · b876b9aa

由 Jiri Pirko 提交于 2月 03, 2017

Introduce basic set of Spectrum flexible key blocks. It contains blocks
needed to carry all elements defined so far.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b876b9aa

mlxsw: core: Introduce flexible actions support · 4cda7d8d

由 Jiri Pirko 提交于 2月 03, 2017

Each entry which is matched during ACL lookup points to an action set.
This action set contains up to three separate actions. If more actions
are needed to be chained, the extended set is created to hold them
in KVD linear area.

This patch implements handling of sets and encoding of actions.
Currectly, only two actions are supported. Drop and forward. Forward
action uses PBS pointer to KVD linear area, so the action code needs to
take care of this as well.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cda7d8d

mlxsw: core: Introduce flexible keys support · 3f1a84e6

由 Jiri Pirko 提交于 2月 03, 2017

Hardware supports matching on so called "flexible keys". The idea is to
assemble an optimal key to use for matching according to the fields in
packet (elements) requested by user. Certain sets of elements are
combined into pre-defined blocks. There is a picker to find needed blocks.
Keys consist of 1..n blocks.

Alongside with that, an initial portion of elements is introduced in order
to be able to offload basic cls_flower rules.

Picked keys are cached so multiple rules could share them.

There is an encode function provided that takes care of encoding key and
mask values according to given key.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f1a84e6

mlxsw: reg: Add Policy-Engine Extended Flexible Action Register · e3426e12

由 Jiri Pirko 提交于 2月 03, 2017

PEFA register is used for accessing an extended flexible action entry
in the central KVD Linear Database.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3426e12

mlxsw: reg: Add Policy-Engine Policy Based Switching Register · d120649d

由 Jiri Pirko 提交于 2月 03, 2017

The PPBS register retrieves and sets Policy Based Switching Table entries.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d120649d

mlxsw: reg: Add Policy-Engine Rules Copy Register · 937b682c

由 Jiri Pirko 提交于 2月 03, 2017

The PRCR register is used for accessing rules within a TCAM region.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

937b682c

mlxsw: reg: Add Policy-Engine Port Binding Table · af7170ee

由 Jiri Pirko 提交于 2月 03, 2017

The PPBT is used for configuration of the Port Binding Table.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af7170ee

mlxsw: reg: Add Policy-Engine TCAM Entry Register Version 2 · 0171cdec

由 Jiri Pirko 提交于 2月 03, 2017

The PTCE-V2 register is used for accessing rules within a TCAM region.
It is a new version of PTCE in order to support wider key, mask and
action within a TCAM region.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0171cdec

mlxsw: reg: Add Policy-Engine TCAM Allocation Register · d9c2661e

由 Jiri Pirko 提交于 2月 03, 2017

The PTAR register is used for allocation of regions in the TCAM.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9c2661e

mlxsw: reg: Add Policy-Engine ACL Group Table register · 10fabef5

由 Jiri Pirko 提交于 2月 03, 2017

The PAGT register is used for configuration of the ACL Group Table.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10fabef5

mlxsw: reg: Add Policy-Engine ACL Register · 3279da4c

由 Jiri Pirko 提交于 2月 03, 2017

The PACL register is used for configuration of the ACL.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3279da4c

mlxsw: item: Add helpers for getting pointer into payload for char buffer item · d5e556c6

由 Jiri Pirko 提交于 2月 03, 2017

Sometimes it is handy to get a pointer to a char buffer item and use it
direcly to write/read data. So add these helpers.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5e556c6

mlxsw: item: Add 8bit item helpers · 2946fde9

由 Jiri Pirko 提交于 2月 03, 2017

Item heplers for 8bit values are needed, let's add them.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2946fde9

bonding: Remove unnecessary returned value check · 3d67576d

由 Zhu Yanjun 提交于 2月 02, 2017

The function bond_info_query alwarys returns 0. As such, in the function
bond_do_ioctl, it is not necessary to check the returned value. So the
interface type of the function bond_info_query is changed to void. The
redundant check is removed.
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d67576d

tcp: clear pfmemalloc on outgoing skb · 38ab52e8

由 Eric Dumazet 提交于 2月 02, 2017

Josef Bacik diagnosed following problem :

   I was seeing random disconnects while testing NBD over loopback.
   This turned out to be because NBD sets pfmemalloc on it's socket,
   however the receiving side is a user space application so does not
   have pfmemalloc set on its socket. This means that
   sk_filter_trim_cap will simply drop this packet, under the
   assumption that the other side will simply retransmit. Well we do
   retransmit, and then the packet is just dropped again for the same
   reason.

It seems the better way to address this problem is to clear pfmemalloc
in the TCP transmit path. pfmemalloc strict control really makes sense
on the receive path.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38ab52e8

cxgb4: get rid of custom busy poll code · 5226b791

由 Eric Dumazet 提交于 2月 02, 2017

In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.

Not only we remove lot of code, we also remove one spin_lock()
from driver fast path.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5226b791

myri10ge: get rid of custom busy poll code · 362108b5

由 Eric Dumazet 提交于 2月 02, 2017

Compared to custom busy_poll, the generic NAPI one is simpler and
removes a lot of code. It removes one atomic in the fast path (when
busy poll is not in action) since we do not have to use an extra
spinlock.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

362108b5

be2net: get rid of custom busy poll code · fb6113e6

由 Eric Dumazet 提交于 2月 02, 2017

Compared to custom busy_poll, the generic NAPI one is better, since
it allows to use GRO, and it removes a lot of code and extra locked
operations in fast path.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb6113e6

net: ipv6: Set protocol to kernel for local routes · 94b5e0f9

由 David Ahern 提交于 2月 02, 2017

IPv6 stack does not set the protocol for local routes, so those routes show
up with proto "none":
    $ ip -6 ro ls table local
    local ::1 dev lo proto none metric 0  pref medium
    local 2100:3:: dev lo proto none metric 0  pref medium
    local 2100:3::4 dev lo proto none metric 0  pref medium
    local fe80:: dev lo proto none metric 0  pref medium
    ...

Set rt6i_protocol to RTPROT_KERNEL for consistency with IPv4. Now routes
show up with proto "kernel":
    $ ip -6 ro ls table local
    local ::1 dev lo proto kernel metric 0  pref medium
    local 2100:3:: dev lo proto kernel metric 0  pref medium
    local 2100:3::4 dev lo proto kernel metric 0  pref medium
    local fe80:: dev lo proto kernel metric 0  pref medium
    ...
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94b5e0f9

trace: rename trace_print_hex_seq arg and add kdoc · 3898fac1

由 Daniel Borkmann 提交于 2月 02, 2017

Steven suggested to improve trace_print_hex_seq() a bit after commit
2acae0d5 ("trace: add variant without spacing in trace_print_hex_seq")
in two ways: i) by adding a kdoc comment for the helper function
itself and ii) by renaming 'spacing' argument into 'concatenate'
to better denote that we don't add spaces between each hex bytes.
Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3898fac1

MAINTAINERS: add Ivan as a switchdev maintainer · f38c5ad7

由 Jiri Pirko 提交于 2月 02, 2017

Ivan will be taking care of switchdev code from now on.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38c5ad7